1
|
AlJarf R, Rodrigues CHM, Myung Y, Pires DEV, Ascher DB. piscesCSM: prediction of anticancer synergistic drug combinations. J Cheminform 2024; 16:81. [PMID: 39030592 PMCID: PMC11264925 DOI: 10.1186/s13321-024-00859-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 05/12/2024] [Indexed: 07/21/2024] Open
Abstract
While drug combination therapies are of great importance, particularly in cancer treatment, identifying novel synergistic drug combinations has been a challenging venture. Computational methods have emerged in this context as a promising tool for prioritizing drug combinations for further evaluation, though they have presented limited performance, utility, and interpretability. Here, we propose a novel predictive tool, piscesCSM, that leverages graph-based representations to model small molecule chemical structures to accurately predict drug combinations with favourable anticancer synergistic effects against one or multiple cancer cell lines. Leveraging these insights, we developed a general supervised machine learning model to guide the prediction of anticancer synergistic drug combinations in over 30 cell lines. It achieved an area under the receiver operating characteristic curve (AUROC) of up to 0.89 on independent non-redundant blind tests, outperforming state-of-the-art approaches on both large-scale oncology screening data and an independent test set generated by AstraZeneca (with more than a 16% improvement in predictive accuracy). Moreover, by exploring the interpretability of our approach, we found that simple physicochemical properties and graph-based signatures are predictive of chemotherapy synergism. To provide a simple and integrated platform to rapidly screen potential candidate pairs with favourable synergistic anticancer effects, we made piscesCSM freely available online at https://biosig.lab.uq.edu.au/piscescsm/ as a web server and API. We believe that our predictive tool will provide a valuable resource for optimizing and augmenting combinatorial screening libraries to identify effective and safe synergistic anticancer drug combinations. SCIENTIFIC CONTRIBUTION: This work proposes piscesCSM, a machine-learning-based framework that relies on well-established graph-based representations of small molecules to identify and provide better predictive accuracy of syngenetic drug combinations. Our model, piscesCSM, shows that combining physiochemical properties with graph-based signatures can outperform current architectures on classification prediction tasks. Furthermore, implementing our tool as a web server offers a user-friendly platform for researchers to screen for potential synergistic drug combinations with favorable anticancer effects against one or multiple cancer cell lines.
Collapse
Affiliation(s)
- Raghad AlJarf
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Carlos H M Rodrigues
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia.
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
2
|
Martins P, Mariano D, Carvalho FC, Bastos LL, Moraes L, Paixão V, Cardoso de Melo-Minardi R. Propedia v2.3: A novel representation approach for the peptide-protein interaction database using graph-based structural signatures. FRONTIERS IN BIOINFORMATICS 2023; 3:1103103. [PMID: 36875148 PMCID: PMC9978205 DOI: 10.3389/fbinf.2023.1103103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 01/30/2023] [Indexed: 02/18/2023] Open
Affiliation(s)
- Pedro Martins
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Diego Mariano
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Frederico Chaves Carvalho
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Luana Luiza Bastos
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Lucas Moraes
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vivian Paixão
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Raquel Cardoso de Melo-Minardi
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
3
|
Yang L, He W, Yun Y, Gao Y, Zhu Z, Teng M, Liang Z, Niu L. Defining A Global Map of Functional Group-based 3D Ligand-binding Motifs. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:765-779. [PMID: 35288344 PMCID: PMC9881048 DOI: 10.1016/j.gpb.2021.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 06/30/2021] [Accepted: 09/27/2021] [Indexed: 01/31/2023]
Abstract
Uncovering conserved 3D protein-ligand binding patterns on the basis of functional groups (FGs) shared by a variety of small molecules can greatly expand our knowledge of protein-ligand interactions. Despite that conserved binding patterns for a few commonly used FGs have been reported in the literature, large-scale identification and evaluation of FG-based 3D binding motifs are still lacking. Here, we propose a computational method, Automatic FG-based Three-dimensional Motif Extractor (AFTME), for automatic mapping of 3D motifs to different FGs of a specific ligand. Applying our method to 233 naturally-occurring ligands, we define 481 FG-binding motifs that are highly conserved across different ligand-binding pockets. Systematic analysis further reveals four main classes of binding motifs corresponding to distinct sets of FGs. Combinations of FG-binding motifs facilitate the binding of proteins to a wide spectrum of ligands with various binding affinities. Finally, we show that our FG-motif map can be used to nominate FGs that potentially bind to specific drug targets, thus providing useful insights and guidance for rational design of small-molecule drugs.
Collapse
Affiliation(s)
- Liu Yang
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China
| | - Wei He
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China.
| | - Yuehui Yun
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China
| | - Yongxiang Gao
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China
| | - Zhongliang Zhu
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China
| | - Maikun Teng
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China
| | - Zhi Liang
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China.
| | - Liwen Niu
- School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China; Division of Molecular and Cellular Biophysics, Hefei National Laboratory for Physical Sciences at the Microscale, Hefei 230026, China.
| |
Collapse
|
4
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
5
|
Rodrigues CHM, Ascher DB. CSM-Potential: mapping protein interactions and biological ligands in 3D space using geometric deep learning. Nucleic Acids Res 2022; 50:W204-W209. [PMID: 35609999 PMCID: PMC9252741 DOI: 10.1093/nar/gkac381] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/19/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
Recent advances in protein structural modelling have enabled the accurate prediction of the holo 3D structures of almost any protein, however protein function is intrinsically linked to the interactions it makes. While a number of computational approaches have been proposed to explore potential biological interactions, they have been limited to specific interactions, and have not been readily accessible for non-experts or use in bioinformatics pipelines. Here we present CSM-Potential, a geometric deep learning approach to identify regions of a protein surface that are likely to mediate protein-protein and protein-ligand interactions in order to provide a link between 3D structure and biological function. Our method has shown robust performance, outperforming existing methods for both predictive tasks. By assessing the performance of CSM-Potential on independent blind tests, we show that our method was able to achieve ROC AUC values of up to 0.81 for the identification of potential protein-protein binding sites, and up to 0.96 accuracy on biological ligand classification. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
6
|
Santana CA, Izidoro SC, de Melo-Minardi RC, Tyzack JD, Ribeiro AJM, Pires DEV, Thornton JM, de A Silveira S. GRaSP-web: a machine learning strategy to predict binding sites based on residue neighborhood graphs. Nucleic Acids Res 2022; 50:W392-W397. [PMID: 35524575 PMCID: PMC9252730 DOI: 10.1093/nar/gkac323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/14/2022] [Accepted: 04/22/2022] [Indexed: 11/14/2022] Open
Abstract
Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at https://grasp.ufv.br.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.,Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil.,Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Parkville 3052, Australia
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
| |
Collapse
|
7
|
de Castro Barbosa E, Alves TMA, Kohlhoff M, Jangola STG, Pires DEV, Figueiredo ACC, Alves ÉAR, Calzavara-Silva CE, Sobral M, Kroon EG, Rosa LH, Zani CL, de Oliveira JG. Searching for plant-derived antivirals against dengue virus and Zika virus. Virol J 2022; 19:31. [PMID: 35193667 PMCID: PMC8861615 DOI: 10.1186/s12985-022-01751-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 01/23/2022] [Indexed: 12/21/2022] Open
Abstract
Background The worldwide epidemics of diseases as dengue and Zika have triggered an intense effort to repurpose drugs and search for novel antivirals to treat patients as no approved drugs for these diseases are currently available. Our aim was to screen plant-derived extracts to identify and isolate compounds with antiviral properties against dengue virus (DENV) and Zika virus (ZIKV).
Methods Seven thousand plant extracts were screened in vitro for their antiviral properties against DENV-2 and ZIKV by their viral cytopathic effect reduction followed by the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) method, previously validated for this purpose. Selected extracts were submitted to bioactivity-guided fractionation using high- and ultrahigh-pressure liquid chromatography. In parallel, high-resolution mass spectrometric data (MSn) were collected from each fraction, allowing compounds into the active fractions to be tracked in subsequent fractionation procedures. The virucidal activity of extracts and compounds was assessed by using the plaque reduction assay. EC50 and CC50 were determined by dose response experiments, and the ratio (EC50/CC50) was used as a selectivity index (SI) to measure the antiviral vs. cytotoxic activity. Purified compounds were used in nuclear magnetic resonance spectroscopy to identify their chemical structures. Two compounds were associated in different proportions and submitted to bioassays against both viruses to investigate possible synergy. In silico prediction of the pharmacokinetic and toxicity (ADMET) properties of the antiviral compounds were calculated using the pkCSM platform. Results We detected antiviral activity against DENV-2 and ZIKV in 21 extracts obtained from 15 plant species. Hippeastrum (Amaryllidaceae) was the most represented genus, affording seven active extracts. Bioactivity-guided fractionation of several extracts led to the purification of lycorine, pretazettine, narciclasine, and narciclasine-4-O-β-D-xylopyranoside (NXP). Another 16 compounds were identified in active fractions. Association of lycorine and pretazettine did not improve their antiviral activity against DENV-2 and neither to ZIKV. ADMET prediction suggested that these four compounds may have a good metabolism and no mutagenic toxicity. Predicted oral absorption, distribution, and excretion parameters of lycorine and pretazettine indicate them as candidates to be tested in animal models. Conclusions Our results showed that plant extracts, especially those from the Hippeastrum genus, can be a valuable source of antiviral compounds against ZIKV and DENV-2. The majority of compounds identified have never been previously described for their activity against ZIKV and other viruses. Supplementary Information The online version contains supplementary material available at 10.1186/s12985-022-01751-z.
Collapse
Affiliation(s)
- Emerson de Castro Barbosa
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Tânia Maria Almeida Alves
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Markus Kohlhoff
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Soraya Torres Gaze Jangola
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Douglas Eduardo Valente Pires
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, 3052, Australia
| | - Anna Carolina Cançado Figueiredo
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Érica Alessandra Rocha Alves
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Carlos Eduardo Calzavara-Silva
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil
| | - Marcos Sobral
- Departamento de Ciências Naturais, Universidade Federal de São João del-Rei, Campus Dom Bosco - Praça Dom Helvécio, 74, São João del-Rei, Minas Gerais, 36301-160, Brasil
| | - Erna Geessien Kroon
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Av Antônio Carlos 6627, Belo Horizonte, Minas Gerais, 31270-901, Brasil
| | - Luiz Henrique Rosa
- Departamento de Microbiologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Av Antônio Carlos 6627, Belo Horizonte, Minas Gerais, 31270-901, Brasil
| | - Carlos Leomar Zani
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil.
| | - Jaquelline Germano de Oliveira
- Instituto René Rachou - Fiocruz Minas, Fundação Oswaldo Cruz, Av. Augusto de Lima 1715, Belo Horizonte, Minas Gerais, 30190-002, Brasil.
| |
Collapse
|
8
|
Rodrigues CHM, Pires DEV, Ascher DB. pdCSM-PPI: Using Graph-Based Signatures to Identify Protein-Protein Interaction Inhibitors. J Chem Inf Model 2021; 61:5438-5445. [PMID: 34719929 DOI: 10.1021/acs.jcim.1c01135] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein-protein interactions are promising sites for development of selective drugs; however, they have generally been viewed as challenging targets. Molecules targeting protein-protein interactions tend to be larger and more lipophilic than other drug-like molecules, mimicking the properties of interacting interfaces. Here, we propose a machine learning approach that uses a graph-based representation of small molecules to guide identification of inhibitors modulating protein-protein interactions, pdCSM-PPI. This approach was applied to 21 different PPI targets. We developed interaction-specific models that were able to accurately identify active compounds achieving MCC and F1 scores up to 1, and Pearson's correlations up to 0.87, outperforming previous approaches. Using insights from these individual models, we developed a generic protein-protein interaction modulator predictive model, which accurately predicted IC50 with a Pearson's correlation of 0.64 on a low redundancy blind test. Importantly, we were able to accurately identify active from inactive compounds, achieving an AUC of 0.77 and sensitivity and specificity of 76% and 78%, respectively. We believe pdCSM-PPI will be an important tool to help guide more efficient screening of new PPI inhibitors; it is freely available as an easy-to-use web server and API at http://biosig.unimelb.edu.au/pdcsm_ppi.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
9
|
Velloso JPL, Ascher DB, Pires DEV. pdCSM-GPCR: predicting potent GPCR ligands with graph-based signatures. BIOINFORMATICS ADVANCES 2021; 1:vbab031. [PMID: 34901870 PMCID: PMC8651072 DOI: 10.1093/bioadv/vbab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 09/30/2021] [Accepted: 11/02/2021] [Indexed: 01/26/2023]
Abstract
MOTIVATION G protein-coupled receptors (GPCRs) can selectively bind to many types of ligands, ranging from light-sensitive compounds, ions, hormones, pheromones and neurotransmitters, modulating cell physiology. Considering their role in many essential cellular processes, they are one of the most targeted protein families, with over a third of all approved drugs modulating GPCR signalling. Despite this, the large diversity of receptors and their multipass transmembrane architectures make the identification and development of novel specific, and safe GPCR ligands a challenge. While computational approaches have the potential to assist GPCR drug development, they have presented limited performance and generalization capabilities. Here, we explored the use of graph-based signatures to develop pdCSM-GPCR, a method capable of rapidly and accurately screening potential GPCR ligands. RESULTS Bioactivity data (IC50, EC50, Ki and Kd) for individual GPCRs were curated. After curation, we used the data for developing predictive models for 36 major GPCR targets, across 4 classes (A, B, C and F). Our models compose the most comprehensive computational resource for GPCR bioactivity prediction to date. Across stratified 10-fold cross-validation and blind tests, our approach achieved Pearson's correlations of up to 0.89, significantly outperforming previous methods. Interpreting our results, we identified common important features of potent GPCRs ligands, which tend to have bicyclic rings, leading to higher levels of aromaticity. We believe pdCSM-GPCR will be an invaluable tool to assist screening efforts, enriching compound libraries and ranking candidates for further experimental validation. AVAILABILITY AND IMPLEMENTATION pdCSM-GPCR predictive models and datasets used have been made available via a freely accessible and easy-to-use web server at http://biosig.unimelb.edu.au/pdcsm_gpcr/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- João Paulo L Velloso
- Fundação Oswaldo Cruz, Instituto René Rachou, Belo Horizonte 30190-009, Brazil
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne 3052, Australia
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne 3052, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne 3053, Australia
| |
Collapse
|
10
|
Pimentel V, Mariano D, Cantão LXS, Bastos LL, Fischer P, de Lima LHF, Fassio AV, de Melo-Minardi RC. VTR: A Web Tool for Identifying Analogous Contacts on Protein Structures and Their Complexes. FRONTIERS IN BIOINFORMATICS 2021; 1:730350. [PMID: 36303745 PMCID: PMC9581016 DOI: 10.3389/fbinf.2021.730350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 07/27/2021] [Indexed: 11/19/2022] Open
Abstract
Evolutionarily related proteins can present similar structures but very dissimilar sequences. Hence, understanding the role of the inter-residues contacts for the protein structure has been the target of many studies. Contacts comprise non-covalent interactions, which are essential to stabilize macromolecular structures such as proteins. Here we show VTR, a new method for the detection of analogous contacts in protein pairs. The VTR web tool performs structural alignment between proteins and detects interactions that occur in similar regions. To evaluate our tool, we proposed three case studies: we 1) compared vertebrate myoglobin and truncated invertebrate hemoglobin; 2) analyzed interactions between the spike protein RBD of SARS-CoV-2 and the cell receptor ACE2; and 3) compared a glucose-tolerant and a non-tolerant β-glucosidase enzyme used for biofuel production. The case studies demonstrate the potential of VTR for the understanding of functional similarities between distantly sequence-related proteins, as well as the exploration of important drug targets and rational design of enzymes for industrial applications. We envision VTR as a promising tool for understanding differences and similarities between homologous proteins with similar 3D structures but different sequences. VTR is available at http://bioinfo.dcc.ufmg.br/vtr.
Collapse
Affiliation(s)
- Vitor Pimentel
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Diego Mariano
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Letícia Xavier Silva Cantão
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Luana Luiza Bastos
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Pedro Fischer
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Universidade Federal de São João Del-Rei, Sete Lagoas, Brazil
| | - Leonardo Henrique Franca de Lima
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Universidade Federal de São João Del-Rei, Sete Lagoas, Brazil
| | - Alexandre Victor Fassio
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Raquel Cardoso de Melo-Minardi
- Laboratory of Bioinformatics and Systems, Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- *Correspondence: Raquel Cardoso de Melo-Minardi,
| |
Collapse
|
11
|
da Silva BM, Myung Y, Ascher DB, Pires DEV. epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform 2021; 23:6407730. [PMID: 34676398 DOI: 10.1093/bib/bbab423] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/25/2021] [Accepted: 09/14/2021] [Indexed: 11/13/2022] Open
Abstract
The ability to identify antigenic determinants of pathogens, or epitopes, is fundamental to guide rational vaccine development and immunotherapies, which are particularly relevant for rapid pandemic response. A range of computational tools has been developed over the past two decades to assist in epitope prediction; however, they have presented limited performance and generalization, particularly for the identification of conformational B-cell epitopes. Here, we present epitope3D, a novel scalable machine learning method capable of accurately identifying conformational epitopes trained and evaluated on the largest curated epitope data set to date. Our method uses the concept of graph-based signatures to model epitope and non-epitope regions as graphs and extract distance patterns that are used as evidence to train and test predictive models. We show epitope3D outperforms available alternative approaches, achieving Mathew's Correlation Coefficient and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.
Collapse
Affiliation(s)
- Bruna Moreira da Silva
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - YooChan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
12
|
Portelli S, Myung Y, Furnham N, Vedithi SC, Pires DEV, Ascher DB. Prediction of rifampicin resistance beyond the RRDR using structure-based machine learning approaches. Sci Rep 2020; 10:18120. [PMID: 33093532 PMCID: PMC7581776 DOI: 10.1038/s41598-020-74648-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 09/21/2020] [Indexed: 01/23/2023] Open
Abstract
Rifampicin resistance is a major therapeutic challenge, particularly in tuberculosis, leprosy, P. aeruginosa and S. aureus infections, where it develops via missense mutations in gene rpoB. Previously we have highlighted that these mutations reduce protein affinities within the RNA polymerase complex, subsequently reducing nucleic acid affinity. Here, we have used these insights to develop a computational rifampicin resistance predictor capable of identifying resistant mutations even outside the well-defined rifampicin resistance determining region (RRDR), using clinical M. tuberculosis sequencing information. Our tool successfully identified up to 90.9% of M. tuberculosis rpoB variants correctly, with sensitivity of 92.2%, specificity of 83.6% and MCC of 0.69, outperforming the current gold-standard GeneXpert-MTB/RIF. We show our model can be translated to other clinically relevant organisms: M. leprae, P. aeruginosa and S. aureus, despite weak sequence identity. Our method was implemented as an interactive tool, SUSPECT-RIF (StrUctural Susceptibility PrEdiCTion for RIFampicin), freely available at https://biosig.unimelb.edu.au/suspect_rif/ .
Collapse
Affiliation(s)
- Stephanie Portelli
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Victoria, 3010, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia
| | - Yoochan Myung
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Victoria, 3010, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | | | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia
- School of Computing and Information Systems, University of Melbourne, Victoria, 3010, Australia
| | - David B Ascher
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Victoria, 3010, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, 3004, VIC, Australia.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
13
|
Abstract
Mutations in protein-coding regions can lead to large biological changes and are associated with genetic conditions, including cancers and Mendelian diseases, as well as drug resistance. Although whole genome and exome sequencing help to elucidate potential genotype-phenotype correlations, there is a large gap between the identification of new variants and deciphering their molecular consequences. A comprehensive understanding of these mechanistic consequences is crucial to better understand and treat diseases in a more personalized and effective way. This is particularly relevant considering estimates that over 80% of mutations associated with a disease are incorrectly assumed to be causative. A thorough analysis of potential effects of mutations is required to correctly identify the molecular mechanisms of disease and enable the distinction between disease-causing and non-disease-causing variation within a gene. Here we present an overview of our integrative mutation analysis platform, which focuses on refining the current genotype-phenotype correlation methods by using the wealth of protein structural information.
Collapse
|
14
|
Proteus: An algorithm for proposing stabilizing mutation pairs based on interactions observed in known protein 3D structures. BMC Bioinformatics 2020; 21:275. [PMID: 32611389 PMCID: PMC7330979 DOI: 10.1186/s12859-020-03575-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/28/2020] [Indexed: 11/10/2022] Open
Abstract
Background Protein engineering has many applications for industry, such as the development of new drugs, vaccines, treatment therapies, food, and biofuel production. A common way to engineer a protein is to perform mutations in functionally essential residues to optimize their function. However, the discovery of beneficial mutations for proteins is a complex task, with a time-consuming and high cost for experimental validation. Hence, computational approaches have been used to propose new insights for experiments narrowing the search space and reducing the costs. Results In this study, we developed Proteus (an acronym for Protein Engineering Supporter), a new algorithm for proposing mutation pairs in a target 3D structure. These suggestions are based on contacts observed in other known structures from Protein Data Bank (PDB). Proteus’ basic assumption is that if a non-interacting pair of amino acid residues in the target structure is exchanged to an interacting pair, this could enhance protein stability. This trade is only allowed if the main-chain conformation of the residues involved in the contact is conserved. Furthermore, no steric impediment is expected between the proposed mutations and the surrounding protein atoms. To evaluate Proteus, we performed two case studies with proteins of industrial interests. In the first case study, we evaluated if the mutations suggested by Proteus for four protein structures enhance the number of inter-residue contacts. Our results suggest that most mutations proposed by Proteus increase the number of interactions into the protein. In the second case study, we used Proteus to suggest mutations for a lysozyme protein. Then, we compared Proteus’ outcomes to mutations with available experimental evidence reported in the ProTherm database. Four mutations, in which our results agree with the experimental data, were found. This could be initial evidence that changes in the side-chain of some residues do not cause disturbances that harm protein structure stability. Conclusion We believe that Proteus could be used combined with other methods to give new insights into the rational development of engineered proteins. Proteus user-friendly web-based tool is available at <http://proteus.dcc.ufmg.br>.
Collapse
|
15
|
Fassio AV, Santos LH, Silveira SA, Ferreira RS, de Melo-Minardi RC. nAPOLI: A Graph-Based Strategy to Detect and Visualize Conserved Protein-Ligand Interactions in Large-Scale. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1317-1328. [PMID: 30629512 DOI: 10.1109/tcbb.2019.2892099] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Essential roles in biological systems depend on protein-ligand recognition, which is mostly driven by specific non-covalent interactions. Consequently, investigating these interactions contributes to understanding how molecular recognition occurs. Nowadays, a large-scale data set of protein-ligand complexes is available in the Protein Data Bank, what led several tools to be proposed as an effort to elucidate protein-ligand interactions. Nonetheless, there is not an all-in-one tool that couples large-scale statistical, visual, and interactive analysis of conserved protein-ligand interactions. Therefore, we propose nAPOLI (Analysis of PrOtein-Ligand Interactions), a web server that combines large-scale analysis of conserved interactions in protein-ligand complexes at the atomic-level, interactive visual representations, and comprehensive reports of the interacting residues/atoms to detect and explore conserved non-covalent interactions. We demonstrate the potential of nAPOLI in detecting important conserved interacting residues through four case studies: two involving a human cyclin-dependent kinase 2 (CDK2), one related to ricin, and other to the human nuclear receptor subfamily 3 (hNR3). nAPOLI proved to be suitable to identify conserved interactions according to literature, as well as highlight additional interactions. Finally, we illustrate, with a virtual screening ligand selection, how nAPOLI can be widely applied in structural biology and drug design. nAPOLI is freely available at bioinfo.dcc.ufmg.br/napoli/.
Collapse
|
16
|
Mariano D, Pantuza N, Santos LH, Rocha REO, de Lima LHF, Bleicher L, de Melo-Minardi RC. Glutantβase: a database for improving the rational design of glucose-tolerant β-glucosidases. BMC Mol Cell Biol 2020; 21:50. [PMID: 32611314 PMCID: PMC7329481 DOI: 10.1186/s12860-020-00293-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 06/22/2020] [Indexed: 11/22/2022] Open
Abstract
Β-glucosidases are key enzymes used in second-generation biofuel production. They act in the last step of the lignocellulose saccharification, converting cellobiose in glucose. However, most of the β-glucosidases are inhibited by high glucose concentrations, which turns it a limiting step for industrial production. Thus, β-glucosidases have been targeted by several studies aiming to understand the mechanism of glucose tolerance, pH and thermal resistance for constructing more efficient enzymes. In this paper, we present a database of β-glucosidase structures, called Glutantβase. Our database includes 3842 GH1 β-glucosidase sequences collected from UniProt. We modeled the sequences by comparison and predicted important features in the 3D-structure of each enzyme. Glutantβase provides information about catalytic and conserved amino acids, residues of the coevolution network, protein secondary structure, and residues located in the channel that guides to the active site. We also analyzed the impact of beneficial mutations reported in the literature, predicted in analogous positions, for similar enzymes. We suggested these mutations based on six previously described mutants that showed high catalytic activity, glucose tolerance, or thermostability (A404V, E96K, H184F, H228T, L441F, and V174C). Then, we used molecular docking to verify the impact of the suggested mutations in the affinity of protein and ligands (substrate and product). Our results suggest that only mutations based on the H228T mutant can reduce the affinity for glucose (product) and increase affinity for cellobiose (substrate), which indicates an increment in the resistance to product inhibition and agrees with computational and experimental results previously reported in the literature. More resistant β-glucosidases are essential to saccharification in industrial applications. However, thermostable and glucose-tolerant β-glucosidases are rare, and their glucose tolerance mechanisms appear to be related to multiple and complex factors. We gather here, a set of information, and made predictions aiming to provide a tool for supporting the rational design of more efficient β-glucosidases. We hope that Glutantβase can help improve second-generation biofuel production. Glutantβase is available at http://bioinfo.dcc.ufmg.br/glutantbase .
Collapse
Affiliation(s)
- Diego Mariano
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| | - Naiara Pantuza
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Lucianna H Santos
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Rafael E O Rocha
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Leonardo H F de Lima
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Universidade Federal de São João Del-Rei, Campus Sete Lagoas, Sete Lagoas, 35701-970, Brazil
| | - Lucas Bleicher
- Protein Computational Biology Laboratory, Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Raquel Cardoso de Melo-Minardi
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| |
Collapse
|
17
|
Ribeiro VS, Santana CA, Fassio AV, Cerqueira FR, da Silveira CH, Romanelli JPR, Patarroyo-Vargas A, Oliveira MGA, Gonçalves-Almeida V, Izidoro SC, de Melo-Minardi RC, Silveira SDA. visGReMLIN: graph mining-based detection and visualization of conserved motifs at 3D protein-ligand interface at the atomic level. BMC Bioinformatics 2020; 21:80. [PMID: 32164574 PMCID: PMC7068867 DOI: 10.1186/s12859-020-3347-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. Results To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we believe are relevant to protein-ligand interactions in the analyzed datasets. Conclusions We aimed to build a visual analytics-oriented web server to detect and visualize common motifs at the protein-ligand interface. visGReMLIN motifs can support users in gaining insights on the key atoms/residues responsible for protein-ligand interactions in a dataset of complexes.
Collapse
Affiliation(s)
- Vagner S Ribeiro
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Charles A Santana
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Alexandre V Fassio
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Petrópolis, 25650-050, Brazil
| | - Carlos H da Silveira
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - João P R Romanelli
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - Adriana Patarroyo-Vargas
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Maria G A Oliveira
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Instituto de Biotecnologia aplicada à Agropecuária (BIOAGRO), Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil
| | - Valdete Gonçalves-Almeida
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sandro C Izidoro
- Department of Computer Engineering, Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira, 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil. .,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK.
| |
Collapse
|
18
|
Pandurangan AP, Blundell TL. Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning. Protein Sci 2020; 29:247-257. [PMID: 31693276 PMCID: PMC6933854 DOI: 10.1002/pro.3774] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/31/2019] [Accepted: 10/31/2019] [Indexed: 02/02/2023]
Abstract
Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE: Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.
Collapse
Affiliation(s)
- Arun Prasad Pandurangan
- Department of BiochemistryUniversity of CambridgeCambridgeUK
- MRC Laboratory of Molecular BiologyCambridgeUK
| | - Tom L. Blundell
- Department of BiochemistryUniversity of CambridgeCambridgeUK
| |
Collapse
|
19
|
A platform for target prediction of phenotypic screening hit molecules. J Mol Graph Model 2019; 95:107485. [PMID: 31836397 PMCID: PMC6983931 DOI: 10.1016/j.jmgm.2019.107485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 09/25/2019] [Accepted: 10/21/2019] [Indexed: 01/09/2023]
Abstract
Many drug discovery programmes, particularly for infectious diseases, are conducted phenotypically. Identifying the targets of phenotypic screening hits experimentally can be complex, time-consuming, and expensive. However, it would be valuable to know what the molecular target(s) is, as knowledge of the binding pose of the hit molecule in the binding site can facilitate the compound optimisation. Furthermore, knowing the target would allow de-prioritisation of less attractive chemical series or molecular targets. To generate target-hypotheses for phenotypic active compounds, an in silico platform was developed that utilises both ligand and protein-structure information to generate a ranked set of predicted molecular targets. As a result of the web-based workflow the user obtains a set of 3D structures of the predicted targets with the active molecule bound. The platform was exemplified using Mycobacterium tuberculosis, the causative organism of tuberculosis. In a test that we performed, the platform was able to predict the targets of 60% of compounds investigated, where there was some similarity to a ligand in the protein database. An algorithm to predict the molecular target(s) of phenotypic hits against TB. Uses information based on the ligand and protein structure. Allow visualisation of proposed binding pose. Web interface developed.
Collapse
|
20
|
Zhuang Q, Holt BA, Kwong GA, Qiu P. Deconvolving multiplexed protease signatures with substrate reduction and activity clustering. PLoS Comput Biol 2019; 15:e1006909. [PMID: 31479443 PMCID: PMC6743790 DOI: 10.1371/journal.pcbi.1006909] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 09/13/2019] [Accepted: 07/29/2019] [Indexed: 12/16/2022] Open
Abstract
Proteases are multifunctional, promiscuous enzymes that degrade proteins as well as peptides and drive important processes in health and disease. Current technology has enabled the construction of libraries of peptide substrates that detect protease activity, which provides valuable biological information. An ideal library would be orthogonal, such that each protease only hydrolyzes one unique substrate, however this is impractical due to off-target promiscuity (i.e., one protease targets multiple different substrates). Therefore, when a library of probes is exposed to a cocktail of proteases, each protease activates multiple probes, producing a convoluted signature. Computational methods for parsing these signatures to estimate individual protease activities primarily use an extensive collection of all possible protease-substrate combinations, which require impractical amounts of training data when expanding to search for more candidate substrates. Here we provide a computational method for estimating protease activities efficiently by reducing the number of substrates and clustering proteases with similar cleavage activities into families. We envision that this method will be used to extract meaningful diagnostic information from biological samples. The activity of enzymatic proteins, which are called proteases, drives numerous important processes in health and disease: including cancer, immunity, and infectious disease. Many labs have developed useful diagnostics by designing sensors that measure the activity of these proteases. However, if we want to detect multiple proteases at the same time, it becomes impractical to design sensors that only detect one protease. This is due to a phenomenon called protease promiscuity, which means that proteases will activate multiple different sensors. Computational methods have been created to solve this problem, but the challenge is that these often require large amounts of training data. Further, completely different proteases may be detected by the same subset of sensors. In this work, we design a computational method to overcome this problem by clustering similar proteases into "subfamilies", which increases estimation accuracy. Further, our method tests multiple combinations of sensors to maintain accuracy while minimizing the number of sensors used. Together, we envision that this work will increase the amount of useful information we can extract from biological samples, which may lead to better clinical diagnostics.
Collapse
Affiliation(s)
- Qinwei Zhuang
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Brandon Alexander Holt
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech College of Engineering and Emory School of Medicine, Atlanta, Georgia, United States of America
| | - Gabriel A. Kwong
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech College of Engineering and Emory School of Medicine, Atlanta, Georgia, United States of America
- Parker H. Petit Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- Integrated Cancer Research Center, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- Georgia ImmunoEngineering Consortium, Georgia Tech and Emory University, Atlanta, Georgia, United States of America
- * E-mail: (GAK); (PQ)
| | - Peng Qiu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech College of Engineering and Emory School of Medicine, Atlanta, Georgia, United States of America
- Parker H. Petit Institute of Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail: (GAK); (PQ)
| |
Collapse
|
21
|
A Computational Method to Propose Mutations in Enzymes Based on Structural Signature Variation (SSV). Int J Mol Sci 2019; 20:ijms20020333. [PMID: 30650542 PMCID: PMC6359350 DOI: 10.3390/ijms20020333] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 12/29/2018] [Accepted: 01/06/2019] [Indexed: 11/26/2022] Open
Abstract
With the use of genetic engineering, modified and sometimes more efficient enzymes can be created for different purposes, including industrial applications. However, building modified enzymes depends on several in vitro experiments, which may result in the process being expensive and time-consuming. Therefore, computational approaches could reduce costs and accelerate the discovery of new technological products. In this study, we present a method, called structural signature variation (SSV), to propose mutations for improving enzymes’ activity. SSV uses the structural signature variation between target enzymes and template enzymes (obtained from the literature) to determine if randomly suggested mutations may provide some benefit for an enzyme, such as improvement of catalytic activity, half-life, and thermostability, or resistance to inhibition. To evaluate SSV, we carried out a case study that suggested mutations in β-glucosidases: Essential enzymes used in biofuel production that suffer inhibition by their product. We collected 27 mutations described in the literature, and manually classified them as beneficial or not. SSV was able to classify the mutations with values of 0.89 and 0.92 for precision and specificity, respectively. Then, we used SSV to propose mutations for Bgl1B, a low-performance β-glucosidase. We detected 15 mutations that could be beneficial. Three of these mutations (H228C, H228T, and H228V) have been related in the literature to the mechanism of glucose tolerance and stimulation in GH1 β-glucosidase. Hence, SSV was capable of detecting promising mutations, already validated by in vitro experiments, that improved the inhibition resistance of a β-glucosidase and, consequently, its catalytic activity. SSV might be useful for the engineering of enzymes used in biofuel production or other industrial applications.
Collapse
|
22
|
Fang H, Zhang Z. An Enhanced Visualization Method to Aid Behavioral Trajectory Pattern Recognition Infrastructure for Big Longitudinal Data. IEEE TRANSACTIONS ON BIG DATA 2018; 4:289-298. [PMID: 29888298 PMCID: PMC5990046 DOI: 10.1109/tbdata.2017.2653815] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Big longitudinal data provide more reliable information for decision making and are common in all kinds of fields. Trajectory pattern recognition is in an urgent need to discover important structures for such data. Developing better and more computationally-efficient visualization tool is crucial to guide this technique. This paper proposes an enhanced projection pursuit (EPP) method to better project and visualize the structures (e.g. clusters) of big high-dimensional (HD) longitudinal data on a lower-dimensional plane. Unlike classic PP methods potentially useful for longitudinal data, EPP is built upon nonlinear mapping algorithms to compute its stress (error) function by balancing the paired weights for between and within structure stress while preserving original structure membership in the high-dimensional space. Specifically, EPP solves an NP hard optimization problem by integrating gradual optimization and non-linear mapping algorithms, and automates the searching of an optimal number of iterations to display a stable structure for varying sample sizes and dimensions. Using publicized UCI and real longitudinal clinical trial datasets as well as simulation, EPP demonstrates its better performance in visualizing big HD longitudinal data.
Collapse
Affiliation(s)
- Hua Fang
- Department of Computer and Information Science, Department of Mathematics, University of Massachusetts Dartmouth, 285 Old Westport Rd, Dartmouth, MA, 02747, and Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, 01605
| | - Zhaoyang Zhang
- College of Engineering, University of Massachusetts Dartmouth and Department of Quantitative Health Sciences, University of Massachusetts Medical School
| |
Collapse
|
23
|
Albanaz ATS, Rodrigues CHM, Pires DEV, Ascher DB. Combating mutations in genetic disease and drug resistance: understanding molecular mechanisms to guide drug design. Expert Opin Drug Discov 2017; 12:553-563. [PMID: 28490289 DOI: 10.1080/17460441.2017.1322579] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Mutations introduce diversity into genomes, leading to selective changes and driving evolution. These changes have contributed to the emergence of many of the current major health concerns of the 21st century, from the development of genetic diseases and cancers to the rise and spread of drug resistance. The experimental systematic testing of all mutations in a system of interest is impractical and not cost-effective, which has created interest in the development of computational tools to understand the molecular consequences of mutations to aid and guide rational experimentation. Areas covered: Here, the authors discuss the recent development of computational methods to understand the effects of coding mutations to protein function and interactions, particularly in the context of the 3D structure of the protein. Expert opinion: While significant progress has been made in terms of innovative tools to understand and quantify the different range of effects in which a mutation or a set of mutations can give rise to a phenotype, a great gap still exists when integrating these predictions and drawing causality conclusions linking variants. This often requires a detailed understanding of the system being perturbed. However, as part of the drug development process it can be used preemptively in a similar fashion to pharmacokinetics predictions, to guide development of therapeutics to help guide the design and analysis of clinical trials, patient treatment and public health policy strategies.
Collapse
Affiliation(s)
- Amanda T S Albanaz
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil.,b Department of Biochemistry and Immunology , Universidade Federal de Minas Gerais , Belo Horizonte , Minas Gerais , Brazil
| | - Carlos H M Rodrigues
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil.,b Department of Biochemistry and Immunology , Universidade Federal de Minas Gerais , Belo Horizonte , Minas Gerais , Brazil
| | - Douglas E V Pires
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil
| | - David B Ascher
- a Centro de Pesquisas René Rachou, FIOCRUZ , Belo Horizonte , MG , Brazil.,c Department of Biochemistry , University of Cambridge , Cambridge , Cambridgeshire , UK.,d Department of Biochemistry and Molecular Biology , University of Melbourne , Melbourne , Victoria , Australia
| |
Collapse
|
24
|
McSkimming DI, Rasheed K, Kannan N. Classifying kinase conformations using a machine learning approach. BMC Bioinformatics 2017; 18:86. [PMID: 28152981 PMCID: PMC5290640 DOI: 10.1186/s12859-017-1506-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/28/2017] [Indexed: 02/07/2023] Open
Abstract
Background Signaling proteins such as protein kinases adopt a diverse array of conformations to respond to regulatory signals in signaling pathways. Perhaps the most fundamental conformational change of a kinase is the transition between active and inactive states, and defining the conformational features associated with kinase activation is critical for selectively targeting abnormally regulated kinases in diseases. While manual examination of crystal structures have led to the identification of key structural features associated with kinase activation, the large number of kinase crystal structures (~3,500) and extensive conformational diversity displayed by the protein kinase superfamily poses unique challenges in fully defining the conformational features associated with kinase activation. Although some computational approaches have been proposed, they are typically based on a small subset of crystal structures using measurements biased towards the active site geometry. Results We utilize an unbiased informatics based machine learning approach to classify all eukaryotic protein kinase conformations deposited in the PDB. We show that the orientation of the activation segment, measured by φ, ψ, χ1, and pseudo-dihedral angles more accurately classify kinase crystal conformations than existing methods. We show that the formation of the K-E salt bridge is statistically dependent upon the activation segment orientation and identify evolutionary differences between the activation segment conformation of tyrosine and serine/threonine kinases. We provide evidence that our method can identify conformational changes associated with the binding of allosteric regulatory proteins, and show that the greatest variation in inactive structures comes from kinase group and family specific side chain orientations. Conclusion We have provided the first comprehensive machine learning based classification of protein kinase active/inactive conformations, taking into account more structures and measurements than any previous classification effort. Further, our unbiased classification of inactive structures reveals residues associated with kinase functional specificity. To enable classification of new crystal structures, we have made our classifier publicly accessible through a stand-alone program housed at https://github.com/esbg/kinconform [DOI:10.5281/zenodo.249090]. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1506-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Khaled Rasheed
- Department of Computer Science, University of Georgia, Athens, GA, 30602, USA
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA, 30602, USA. .,Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA, 30602, USA.
| |
Collapse
|
25
|
He W, Liang Z, Teng M, Niu L. LibME-automatic extraction of 3D ligand-binding motifs for mechanistic analysis of protein-ligand recognition. FEBS Open Bio 2016; 6:1331-1340. [PMID: 28255540 PMCID: PMC5324770 DOI: 10.1002/2211-5463.12150] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Revised: 10/26/2016] [Accepted: 10/27/2016] [Indexed: 11/23/2022] Open
Abstract
Identifying conserved binding motifs is an efficient way to study protein–ligand recognition. Most 3D binding motifs only contain information from the protein side, and so motifs that combine information from both protein and ligand sides are desired. Here, we propose an algorithm called LibME (Ligand‐binding Motif Extractor), which automatically extracts 3D binding motifs composed of the target ligand and surrounding conserved residues. We show that the motifs extracted by LibME for ATP and its analogs are highly similar to well‐known motifs reported by previous studies. The superiority of our method to handle flexible ligands was also demonstrated using isocitric acid as an example. Finally, we show that these motifs, together with their visual exhibition, permit better investigating and understanding of protein–ligand recognition process.
Collapse
Affiliation(s)
- Wei He
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences University of Science and Technology of China Anhui China
| | - Zhi Liang
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences University of Science and Technology of China Anhui China
| | - MaiKun Teng
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences University of Science and Technology of China Anhui China
| | - LiWen Niu
- Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences University of Science and Technology of China Anhui China
| |
Collapse
|
26
|
Pires DEV, Ascher DB. CSM-lig: a web server for assessing and comparing protein-small molecule affinities. Nucleic Acids Res 2016; 44:W557-61. [PMID: 27151202 PMCID: PMC4987933 DOI: 10.1093/nar/gkw390] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 04/28/2016] [Indexed: 12/21/2022] Open
Abstract
Determining the affinity of a ligand for a given protein is a crucial component of drug development and understanding their biological effects. Predicting binding affinities is a challenging and difficult task, and despite being regarded as poorly predictive, scoring functions play an important role in the analysis of molecular docking results. Here, we present CSM-Lig (http://structure.bioc.cam.ac.uk/csm_lig), a web server tailored to predict the binding affinity of a protein-small molecule complex, encompassing both protein and small-molecule complementarity in terms of shape and chemistry via graph-based structural signatures. CSM-Lig was trained and evaluated on different releases of the PDBbind databases, achieving a correlation of up to 0.86 on 10-fold cross validation and 0.80 in blind tests, performing as well as or better than other widely used methods. The web server allows users to rapidly and automatically predict binding affinities of collections of structures and assess the interactions made. We believe CSM-lig would be an invaluable tool for helping assess docking poses, the effects of multiple mutations, including insertions, deletions and alternative splicing events, in protein-small molecule affinity, unraveling important aspects that drive protein–compound recognition.
Collapse
Affiliation(s)
- Douglas E V Pires
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, 30190-002, Brazil
| | - David B Ascher
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte, 30190-002, Brazil Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK Department of Biochemistry, University of Melbourne, Victoria 3010, Australia
| |
Collapse
|
27
|
Ain QU, Aleksandrova A, Roessler FD, Ballester PJ. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2015; 5:405-424. [PMID: 27110292 PMCID: PMC4832270 DOI: 10.1002/wcms.1225] [Citation(s) in RCA: 190] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 07/17/2015] [Accepted: 07/18/2015] [Indexed: 12/29/2022]
Abstract
Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Qurrat Ul Ain
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | | | - Florian D Roessler
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | - Pedro J Ballester
- Cancer Research Center of Marseille, (INSERM U1068, Institut Paoli-Calmettes, Aix-Marseille Université, CNRS UMR7258) Marseille France
| |
Collapse
|
28
|
Computational approaches to study the effects of small genomic variations. J Mol Model 2015; 21:251. [PMID: 26350246 DOI: 10.1007/s00894-015-2794-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 08/23/2015] [Indexed: 10/23/2022]
Abstract
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.
Collapse
|
29
|
Gonçalves WRS, Gonçalves-Almeida VM, Arruda AL, Meira W, da Silveira CH, Pires DEV, de Melo-Minardi RC. PDBest: a user-friendly platform for manipulating and enhancing protein structures. Bioinformatics 2015; 31:2894-6. [PMID: 25910698 DOI: 10.1093/bioinformatics/btv223] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 04/19/2015] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED PDBest (PDB Enhanced Structures Toolkit) is a user-friendly, freely available platform for acquiring, manipulating and normalizing protein structures in a high-throughput and seamless fashion. With an intuitive graphical interface it allows users with no programming background to download and manipulate their files. The platform also exports protocols, enabling users to easily share PDB searching and filtering criteria, enhancing analysis reproducibility. AVAILABILITY AND IMPLEMENTATION PDBest installation packages are freely available for several platforms at http://www.pdbest.dcc.ufmg.br CONTACT wellisson@dcc.ufmg.br, dpires@dcc.ufmg.br, raquelcm@dcc.ufmg.br SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Aleksander L Arruda
- Department of Computer Science, Universidade Federal de Minas Gerais, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Brazil
| | | | | | | |
Collapse
|
30
|
Pires DEV, Blundell TL, Ascher DB. pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures. J Med Chem 2015; 58:4066-72. [PMID: 25860834 PMCID: PMC4434528 DOI: 10.1021/acs.jmedchem.5b00104] [Citation(s) in RCA: 2009] [Impact Index Per Article: 223.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
![]()
Drug development has a high attrition
rate, with poor pharmacokinetic
and safety properties a significant hurdle. Computational approaches
may help minimize these risks. We have developed a novel approach
(pkCSM) which uses graph-based signatures to develop predictive models
of central ADMET properties for drug development. pkCSM performs as
well or better than current methods. A freely accessible web server
(http://structure.bioc.cam.ac.uk/pkcsm), which retains
no information submitted to it, provides an integrated platform to
rapidly evaluate pharmacokinetic and toxicity properties.
Collapse
Affiliation(s)
- Douglas E V Pires
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K.,‡Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Tom L Blundell
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K
| | - David B Ascher
- †Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Sanger Building, Cambridge, Cambridgshire CB2 1GA, U.K
| |
Collapse
|
31
|
From local to global changes in proteins: a network view. Curr Opin Struct Biol 2015; 31:1-8. [DOI: 10.1016/j.sbi.2015.02.015] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 02/15/2015] [Accepted: 02/26/2015] [Indexed: 02/01/2023]
|
32
|
Ochoa-Montaño B, Mohan N, Blundell TL. CHOPIN: a web resource for the structural and functional proteome of Mycobacterium tuberculosis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav026. [PMID: 25833954 PMCID: PMC4381106 DOI: 10.1093/database/bav026] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 03/01/2015] [Indexed: 11/18/2022]
Abstract
Tuberculosis kills more than a million people annually and presents increasingly high levels of resistance against current first line drugs. Structural information about Mycobacterium tuberculosis (Mtb) proteins is a valuable asset for the development of novel drugs and for understanding the biology of the bacterium; however, only about 10% of the ∼4000 proteins have had their structures determined experimentally. The CHOPIN database assigns structural domains and generates homology models for 2911 sequences, corresponding to ∼73% of the proteome. A sophisticated pipeline allows multiple models to be created using conformational states characteristic of different oligomeric states and ligand binding, such that the models reflect various functional states of the proteins. Additionally, CHOPIN includes structural analyses of mutations potentially associated with drug resistance. Results are made available at the web interface, which also serves as an automatically updated repository of all published Mtb experimental structures. Its RESTful interface allows direct and flexible access to structures and metadata via intuitive URLs, enabling easy programmatic use of the models. Database URL: http://structure.bioc.cam.ac.uk/chopin
Collapse
Affiliation(s)
- Bernardo Ochoa-Montaño
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Nishita Mohan
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| |
Collapse
|
33
|
Li H, Leung KS, Wong MH, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform 2015; 34:115-26. [PMID: 27490034 DOI: 10.1002/minf.201400132] [Citation(s) in RCA: 151] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 12/06/2014] [Indexed: 12/28/2022]
Abstract
There is a growing body of evidence showing that machine learning regression results in more accurate structure-based prediction of protein-ligand binding affinity. Docking methods that aim at optimizing the affinity of ligands for a target rely on how accurate their predicted ranking is. However, despite their proven advantages, machine-learning scoring functions are still not widely applied. This seems to be due to insufficient understanding of their properties and the lack of user-friendly software implementing them. Here we present a study where the accuracy of AutoDock Vina, arguably the most commonly-used docking software, is strongly improved by following a machine learning approach. We also analyse the factors that are responsible for this improvement and their generality. Most importantly, with the help of a proposed benchmark, we demonstrate that this improvement will be larger as more data becomes available for training Random Forest models, as regression models implying additive functional forms do not improve with more training data. We discuss how the latter opens the door to new opportunities in scoring function development. In order to facilitate the translation of this advance to enhance structure-based molecular design, we provide software to directly re-score Vina-generated poses and thus strongly improve their predicted binding affinity. The software is available at http://istar.cse.cuhk.edu.hk/rf-score-3.tgz and http://crcm. marseille.inserm.fr/fileadmin/rf-score-3.tgz.
Collapse
Affiliation(s)
- Hongjian Li
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Man-Hon Wong
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Pedro J Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. .,Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France, Institut Paoli-Calmettes, F-13009 Marseille, France, Aix-Marseille Université, F-13284 Marseille, France, CNRS UMR7258, F-13009 Marseille, France.
| |
Collapse
|
34
|
Ascher DB, Jubb HC, Pires DEV, Ochi T, Higueruelo A, Blundell TL. Protein-Protein Interactions: Structures and Druggability. MULTIFACETED ROLES OF CRYSTALLOGRAPHY IN MODERN DRUG DISCOVERY 2015. [DOI: 10.1007/978-94-017-9719-1_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
35
|
Gossage L, Pires DEV, Olivera-Nappa Á, Asenjo J, Bycroft M, Blundell TL, Eisen T. An integrated computational approach can classify VHL missense mutations according to risk of clear cell renal carcinoma. Hum Mol Genet 2014; 23:5976-88. [PMID: 24969085 PMCID: PMC4204774 DOI: 10.1093/hmg/ddu321] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Revised: 05/25/2014] [Accepted: 06/17/2014] [Indexed: 12/26/2022] Open
Abstract
Mutations in the von Hippel-Lindau (VHL) gene are pathogenic in VHL disease, congenital polycythaemia and clear cell renal carcinoma (ccRCC). pVHL forms a ternary complex with elongin C and elongin B, critical for pVHL stability and function, which interacts with Cullin-2 and RING-box protein 1 to target hypoxia-inducible factor for polyubiquitination and proteasomal degradation. We describe a comprehensive database of missense VHL mutations linked to experimental and clinical data. We use predictions from in silico tools to link the functional effects of missense VHL mutations to phenotype. The risk of ccRCC in VHL disease is linked to the degree of destabilization resulting from missense mutations. An optimized binary classification system (symphony), which integrates predictions from five in silico methods, can predict the risk of ccRCC associated with VHL missense mutations with high sensitivity and specificity. We use symphony to generate predictions for risk of ccRCC for all possible VHL missense mutations and present these predictions, in association with clinical and experimental data, in a publically available, searchable web server.
Collapse
Affiliation(s)
- Lucy Gossage
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Álvaro Olivera-Nappa
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK, Centre for Biochemical Engineering and Biotechnology, University of Chile, Beauchef 850, Santiago, Chile
| | - Juan Asenjo
- Centre for Biochemical Engineering and Biotechnology, University of Chile, Beauchef 850, Santiago, Chile
| | - Mark Bycroft
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Research Centre, Cambridge CB2 0QH, UK and
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Tim Eisen
- Department of Oncology, Cambridge University Hospitals NHS Foundation Trust, Box 193 (R4) Addenbrooke's Hospital, Cambridge Biomedical Campus, Hill's Road, Cambridge CB2 0QQ, UK
| |
Collapse
|
36
|
Ekins S, Clark AM, Swamidass SJ, Litterman N, Williams AJ. Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 2014; 28:997-1008. [PMID: 24943138 PMCID: PMC4198464 DOI: 10.1007/s10822-014-9762-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 06/09/2014] [Indexed: 12/31/2022]
Abstract
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA,
| | | | | | | | | |
Collapse
|
37
|
Silveira SA, Fassio AV, Gonçalves-Almeida VM, de Lima EB, Barcelos YT, Aburjaile FF, Rodrigues LM, Meira W, de Melo-Minardi RC. VERMONT: Visualizing mutations and their effects on protein physicochemical and topological property conservation. BMC Proc 2014; 8:S4. [PMID: 25237391 PMCID: PMC4155615 DOI: 10.1186/1753-6561-8-s2-s4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In this paper, we propose an interactive visualization called VERMONT which tackles the problem of visualizing mutations and infers their possible effects on the conservation of physicochemical and topological properties in protein families. More specifically, we visualize a set of structure-based sequence alignments and integrate several structural parameters that should aid biologists in gaining insight into possible consequences of mutations. VERMONT allowed us to identify patterns of position-specific properties as well as exceptions that may help predict whether specific mutations could damage protein function.
Collapse
Affiliation(s)
- Sabrina A Silveira
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Alexandre V Fassio
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Valdete M Gonçalves-Almeida
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Elisa B de Lima
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Yussif T Barcelos
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Flávia F Aburjaile
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Laerte M Rodrigues
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil.,Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Universidade Federal de Minas Gerais, Av. Antônio Carlos 6.627, 31270-901, Belo Horizonte, Brazil
| |
Collapse
|
38
|
Pires DEV, Ascher DB, Blundell TL. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res 2014; 42:W314-9. [PMID: 24829462 PMCID: PMC4086143 DOI: 10.1093/nar/gku411] [Citation(s) in RCA: 590] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Cancer genome and other sequencing initiatives are generating extensive data on non-synonymous single nucleotide polymorphisms (nsSNPs) in human and other genomes. In order to understand the impacts of nsSNPs on the structure and function of the proteome, as well as to guide protein engineering, accurate in silicomethodologies are required to study and predict their effects on protein stability. Despite the diversity of available computational methods in the literature, none has proven accurate and dependable on its own under all scenarios where mutation analysis is required. Here we present DUET, a web server for an integrated computational approach to study missense mutations in proteins. DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimized predictor using Support Vector Machines (SVM). We demonstrate that the proposed method improves overall accuracy of the predictions in comparison with either method individually and performs as well as or better than similar methods. The DUET web server is freely and openly available at http://structure.bioc.cam.ac.uk/duet.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - David B Ascher
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK ACRF Rational Drug Discovery Centre and Biota Structural Biology Laboratory, St Vincents Institute of Medical Research, Fitzroy, VIC 3065, Australia
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| |
Collapse
|
39
|
Silveira SDA, de Melo-Minardi RC, da Silveira CH, Santoro MM, Meira Jr W. ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot. PLoS One 2014; 9:e89162. [PMID: 24586563 PMCID: PMC3929618 DOI: 10.1371/journal.pone.0089162] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 01/19/2014] [Indexed: 11/18/2022] Open
Abstract
The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.
Collapse
Affiliation(s)
- Sabrina de Azevedo Silveira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (SAS); (WM)
| | | | | | - Marcelo Matos Santoro
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira Jr
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (SAS); (WM)
| |
Collapse
|
40
|
Pires DEV, Ascher DB, Blundell TL. mCSM: predicting the effects of mutations in proteins using graph-based signatures. ACTA ACUST UNITED AC 2013; 30:335-42. [PMID: 24281696 PMCID: PMC3904523 DOI: 10.1093/bioinformatics/btt691] [Citation(s) in RCA: 657] [Impact Index Per Article: 59.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Motivation: Mutations play fundamental roles in evolution by introducing diversity into genomes. Missense mutations in structural genes may become either selectively advantageous or disadvantageous to the organism by affecting protein stability and/or interfering with interactions between partners. Thus, the ability to predict the impact of mutations on protein stability and interactions is of significant value, particularly in understanding the effects of Mendelian and somatic mutations on the progression of disease. Here, we propose a novel approach to the study of missense mutations, called mCSM, which relies on graph-based signatures. These encode distance patterns between atoms and are used to represent the protein residue environment and to train predictive models. To understand the roles of mutations in disease, we have evaluated their impacts not only on protein stability but also on protein–protein and protein–nucleic acid interactions. Results: We show that mCSM performs as well as or better than other methods that are used widely. The mCSM signatures were successfully used in different tasks demonstrating that the impact of a mutation can be correlated with the atomic-distance patterns surrounding an amino acid residue. We showed that mCSM can predict stability changes of a wide range of mutations occurring in the tumour suppressor protein p53, demonstrating the applicability of the proposed method in a challenging disease scenario. Availability and implementation: A web server is available at http://structure.bioc.cam.ac.uk/mcsm. Contact:dpires@dcc.ufmg.br; tom@cryst.bioc.cam.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Douglas E V Pires
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK and ACRF Rational Drug Discovery Centre and Biota Structural Biology Laboratory, St Vincents Institute of Medical Research, Fitzroy, VIC, 3065, Australia
| | | | | |
Collapse
|
41
|
Ponder EL, Freundlich JS, Sarker M, Ekins S. Computational models for neglected diseases: gaps and opportunities. Pharm Res 2013; 31:271-7. [PMID: 23990313 DOI: 10.1007/s11095-013-1170-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 07/28/2013] [Indexed: 01/22/2023]
Abstract
Neglected diseases, such as Chagas disease, African sleeping sickness, and intestinal worms, affect millions of the world's poor. They disproportionately affect marginalized populations, lack effective treatments or vaccines, or existing products are not accessible to the populations affected. Computational approaches have been used across many of these diseases for various aspects of research or development, and yet data produced by computational approaches are not integrated and widely accessible to others. Here, we identify gaps in which computational approaches have been used for some neglected diseases and not others. We also make recommendations for the broad-spectrum integration of these techniques into a neglected disease drug discovery and development workflow.
Collapse
Affiliation(s)
- Elizabeth L Ponder
- Center for Emerging and Neglected Diseases, Berkeley, 444A Li Ka Shing Center, Berkeley, California, 94720-3370, USA,
| | | | | | | |
Collapse
|