1
|
Hanna G, Khanna T, Islam SA, David A, Sternberg MJE. Missense3D-TM: Predicting the Effect of Missense Variants in Helical Transmembrane Protein Regions Using 3D Protein Structures. J Mol Biol 2024; 436:168374. [PMID: 38182301 DOI: 10.1016/j.jmb.2023.168374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 01/07/2024]
Abstract
Variant effect predictors assess if a substitution is pathogenic or benign. Most predictors, including those that are structure-based, are designed for globular proteins in aqueous environments and do not consider that the variant residue is located within the membrane. We report Missense3D-TM that provides a structure-based assessment of the impact of a missense variant located within a membrane. On a dataset of 2,078 pathogenic and 1,060 benign variants, spanning 711 proteins from 706 structures, Missense3D-TM achieved an accuracy of 66%, Mathews correlation coefficient of 0.37, sensitivity of 58% and specificity of 81%. Missense3D-TM performed similarly to mCSM-membrane: accuracy 66% vs 61% (p = 0.02) on an unbalanced test set and 70% vs 67% (p = 0.20) on a balanced test set. The Missense3D-TM website provides an analysis of the structural effects of the variant along with its predicted position within the membrane. The web server is available at http://missense3d.bc.ic.ac.uk/.
Collapse
Affiliation(s)
- Gordon Hanna
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Tarun Khanna
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Suhail A Islam
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
2
|
Mathews DH, Casadio R, Sternberg MJE. Computational Resources for Molecular Biology 2023. J Mol Biol 2023:168160. [PMID: 37244569 DOI: 10.1016/j.jmb.2023.168160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Affiliation(s)
- David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA.
| | - Rita Casadio
- Biocomputing Group, FABIT-University of Bologna, Bologna I-40126, Italy.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
3
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
4
|
McGreig JE, Uri H, Antczak M, Sternberg MJE, Michaelis M, Wass MN. 3DLigandSite: structure-based prediction of protein-ligand binding sites. Nucleic Acids Res 2022; 50:W13-W20. [PMID: 35412635 PMCID: PMC9252821 DOI: 10.1093/nar/gkac250] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/13/2022] [Accepted: 04/03/2022] [Indexed: 01/13/2023] Open
Abstract
3DLigandSite is a web tool for the prediction of ligand-binding sites in proteins. Here, we report a significant update since the first release of 3DLigandSite in 2010. The overall methodology remains the same, with candidate binding sites in proteins inferred using known binding sites in related protein structures as templates. However, the initial structural modelling step now uses the newly available structures from the AlphaFold database or alternatively Phyre2 when AlphaFold structures are not available. Further, a sequence-based search using HHSearch has been introduced to identify template structures with bound ligands that are used to infer the ligand-binding residues in the query protein. Finally, we introduced a machine learning element as the final prediction step, which improves the accuracy of predictions and provides a confidence score for each residue predicted to be part of a binding site. Validation of 3DLigandSite on a set of 6416 binding sites obtained 92% recall at 75% precision for non-metal binding sites and 52% recall at 75% precision for metal binding sites. 3DLigandSite is available at https://www.wass-michaelislab.org/3dligandsite. Users submit either a protein sequence or structure. Results are displayed in multiple formats including an interactive Mol* molecular visualization of the protein and the predicted binding sites.
Collapse
Affiliation(s)
- Jake E McGreig
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Hannah Uri
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Magdalena Antczak
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Martin Michaelis
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Mark N Wass
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| |
Collapse
|
5
|
David A, Islam S, Tankhilevich E, Sternberg MJE. The AlphaFold Database of Protein Structures: A Biologist's Guide. J Mol Biol 2021; 434:167336. [PMID: 34757056 PMCID: PMC8783046 DOI: 10.1016/j.jmb.2021.167336] [Citation(s) in RCA: 98] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/25/2021] [Accepted: 10/26/2021] [Indexed: 01/06/2023]
Abstract
AlphaFold, the deep learning algorithm developed by DeepMind, recently released the three-dimensional models of the whole human proteome to the scientific community. Here we discuss the advantages, limitations and the still unsolved challenges of the AlphaFold models from the perspective of a biologist, who may not be an expert in structural biology.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| | - Suhail Islam
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Evgeny Tankhilevich
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative System Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
6
|
Kelley LA, Powell HR, Sternberg MJE. Le mieux est l'enemi du bon. Homology modelling with Phyre2 in a deep learning world. Acta Crystallogr A Found Adv 2021. [DOI: 10.1107/s0108767321095842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
7
|
Casadio R, Lenhard B, Sternberg MJE. Computational Resources for Molecular Biology 2021. J Mol Biol 2021; 433:166962. [PMID: 33774035 DOI: 10.1016/j.jmb.2021.166962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group, FABIT-University of Bologna, Italy
| | - Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine. Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK; Computational Regulatory Genomics, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
8
|
Leal LG, David A, Jarvelin MR, Sebert S, Männikkö M, Karhunen V, Seaby E, Hoggart C, Sternberg MJE. Identification of disease-associated loci using machine learning for genotype and network data integration. Bioinformatics 2020; 35:5182-5190. [PMID: 31070705 PMCID: PMC6954643 DOI: 10.1093/bioinformatics/btz310] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/28/2019] [Accepted: 04/25/2019] [Indexed: 01/19/2023] Open
Abstract
Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luis G Leal
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Alessia David
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Marjo-Riita Jarvelin
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Sylvain Sebert
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland
| | - Minna Männikkö
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland
| | - Ville Karhunen
- Center for Life Course Health Research, Faculty of Medicine, University of Oulu, Oulu FI-90014, Finland.,Biocenter Oulu, University of Oulu, Oulu 90220, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu 90220, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London W2 1PG, UK.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex UB8 3PH, UK
| | - Eleanor Seaby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Clive Hoggart
- Department of Medicine, Imperial College London, London W2 1PG, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
9
|
Sillitoe I, Andreeva A, Blundell TL, Buchan DWA, Finn RD, Gough J, Jones D, Kelley LA, Paysan-Lafosse T, Lam SD, Murzin AG, Pandurangan AP, Salazar GA, Skwark MJ, Sternberg MJE, Velankar S, Orengo C. Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation. Nucleic Acids Res 2020; 48:D314-D319. [PMID: 31733063 PMCID: PMC7139969 DOI: 10.1093/nar/gkz967] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/09/2019] [Accepted: 11/07/2019] [Indexed: 12/20/2022] Open
Abstract
Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
| | - Antonina Andreeva
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge CB2 0QH, UK
| | - Daniel W A Buchan
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK.,The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, UK
| | - Robert D Finn
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - David Jones
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK.,The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, UK
| | - Lawrence A Kelley
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Typhaine Paysan-Lafosse
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia
| | - Alexey G Murzin
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | | | - Gustavo A Salazar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marcin J Skwark
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge CB2 0QH, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sameer Velankar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
10
|
Leal LG, Hoggart C, Jarvelin MR, Herzig KH, Sternberg MJE, David A. A polygenic biomarker to identify patients with severe hypercholesterolemia of polygenic origin. Mol Genet Genomic Med 2020; 8:e1248. [PMID: 32307928 PMCID: PMC7284038 DOI: 10.1002/mgg3.1248] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 02/24/2020] [Accepted: 03/02/2020] [Indexed: 12/11/2022] Open
Abstract
Background Severe hypercholesterolemia (HC, LDL‐C > 4.9 mmol/L) affects over 30 million people worldwide. In this study, we validated a new polygenic risk score (PRS) for LDL‐C. Methods Summary statistics from the Global Lipid Genome Consortium and genotype data from two large populations were used. Results A 36‐SNP PRS was generated using data for 2,197 white Americans. In a replication cohort of 4,787 Finns, the PRS was strongly associated with the LDL‐C trait and explained 8% of its variability (p = 10–41). After risk categorization, the risk of having HC was higher in the high‐ versus low‐risk group (RR = 4.17, p < 1 × 10−7). Compared to a 12‐SNP LDL‐C raising score (currently used in the United Kingdom), the PRS explained more LDL‐C variability (8% vs. 6%). Among Finns with severe HC, 53% (66/124) versus 44% (55/124) were classified as high risk by the PRS and LDL‐C raising score, respectively. Moreover, 54% of individuals with severe HC defined as low risk by the LDL‐C raising score were reclassified to intermediate or high risk by the new PRS. Conclusion The new PRS has a better predictive role in identifying HC of polygenic origin compared to the currently available method and can better stratify patients into diagnostic and therapeutic algorithms.
Collapse
Affiliation(s)
- Luis G Leal
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Clive Hoggart
- Department of Medicine, Imperial College London, London, United Kingdom
| | - Marjo-Riitta Jarvelin
- Faculty of Medicine, Center for Life Course Health Research, University of Oulu, Oulu, Finland.,Biocenter Oulu, University of Oulu, Oulu, Finland.,Unit of Primary Health Care, Oulu University Hospital, Oulu, Finland.,Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, United Kingdom.,Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Middlesex, United Kingdom
| | - Karl-Heinz Herzig
- Biocenter Oulu, University of Oulu, Oulu, Finland.,Research Unit of Biomedicine, Oulu University, Oulu, Oulu University Hospital and Medical Research Center Oulu, Oulu, Finland.,Department of Gastroenterology and Metabolism, Poznan University of Medical Sciences, Poznan, Poland
| | - Michael J E Sternberg
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Alessia David
- Department of Life Sciences, Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| |
Collapse
|
11
|
Singh A, Dauzhenka T, Kundrotas PJ, Sternberg MJE, Vakser IA. Application of docking methodologies to modeled proteins. Proteins 2020; 88:1180-1188. [PMID: 32170770 DOI: 10.1002/prot.25889] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 02/15/2020] [Accepted: 03/07/2020] [Indexed: 12/12/2022]
Abstract
Protein docking is essential for structural characterization of protein interactions. Besides providing the structure of protein complexes, modeling of proteins and their complexes is important for understanding the fundamental principles and specific aspects of protein interactions. The accuracy of protein modeling, in general, is still less than that of the experimental approaches. Thus, it is important to investigate the applicability of docking techniques to modeled proteins. We present new comprehensive benchmark sets of protein models for the development and validation of protein docking, as well as a systematic assessment of free and template-based docking techniques on these sets. As opposed to previous studies, the benchmark sets reflect the real case modeling/docking scenario where the accuracy of the models is assessed by the modeling procedure, without reference to the native structure (which would be unknown in practical applications). We also expanded the analysis to include docking of protein pairs where proteins have different structural accuracy. The results show that, in general, the template-based docking is less sensitive to the structural inaccuracies of the models than the free docking. The near-native docking poses generated by the template-based approach, typically, also have higher ranks than those produces by the free docking (although the free docking is indispensable in modeling the multiplicity of protein interactions in a crowded cellular environment). The results show that docking techniques are applicable to protein models in a broad range of modeling accuracy. The study provides clear guidelines for practical applications of docking to protein models.
Collapse
Affiliation(s)
- Amar Singh
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
| | - Taras Dauzhenka
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
| | - Petras J Kundrotas
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, South Kensington, London, UK
| | - Ilya A Vakser
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, USA
| |
Collapse
|
12
|
Wodak SJ, Velankar S, Sternberg MJE. Modeling protein interactions and complexes in CAPRI: Seventh CAPRI evaluation meeting, April 3‐5 EMBL‐EBI, Hinxton, UK. Proteins 2020; 88:913-915. [DOI: 10.1002/prot.25883] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Accepted: 01/25/2020] [Indexed: 11/11/2022]
Affiliation(s)
| | - Sameer Velankar
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Genome Campus Cambridge UK
| | | |
Collapse
|
13
|
Cornish AJ, David A, Sternberg MJE. PhenoRank: reducing study bias in gene prioritization through simulation. Bioinformatics 2019; 34:2087-2095. [PMID: 29360927 PMCID: PMC5949213 DOI: 10.1093/bioinformatics/bty028] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 01/16/2018] [Indexed: 02/07/2023] Open
Abstract
Motivation Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10-16). Availability and implementation PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex J Cornish
- Department of Life Sciences, Center of Bioinformatics and Systems Biology, Imperial College London, London, UK
| | - Alessia David
- Department of Life Sciences, Center of Bioinformatics and Systems Biology, Imperial College London, London, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Center of Bioinformatics and Systems Biology, Imperial College London, London, UK
| |
Collapse
|
14
|
Ciezarek AG, Osborne OG, Shipley ON, Brooks EJ, Tracey SR, McAllister JD, Gardner LD, Sternberg MJE, Block B, Savolainen V. Phylotranscriptomic Insights into the Diversification of Endothermic Thunnus Tunas. Mol Biol Evol 2019; 36:84-96. [PMID: 30364966 PMCID: PMC6340463 DOI: 10.1093/molbev/msy198] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Birds, mammals, and certain fishes, including tunas, opahs and lamnid sharks, are endothermic, conserving internally generated, metabolic heat to maintain body or tissue temperatures above that of the environment. Bluefin tunas are commercially important fishes worldwide, and some populations are threatened. They are renowned for their endothermy, maintaining elevated temperatures of the oxidative locomotor muscle, viscera, brain and eyes, and occupying cold, productive high-latitude waters. Less cold-tolerant tunas, such as yellowfin tuna, by contrast, remain in warm-temperate to tropical waters year-round, reproducing more rapidly than most temperate bluefin tuna populations, providing resiliency in the face of large-scale industrial fisheries. Despite the importance of these traits to not only fisheries but also habitat utilization and responses to climate change, little is known of the genetic processes underlying the diversification of tunas. In collecting and analyzing sequence data across 29,556 genes, we found that parallel selection on standing genetic variation is associated with the evolution of endothermy in bluefin tunas. This includes two shared substitutions in genes encoding glycerol-3 phosphate dehydrogenase, an enzyme that contributes to thermogenesis in bumblebees and mammals, as well as four genes involved in the Krebs cycle, oxidative phosphorylation, β-oxidation, and superoxide removal. Using phylogenetic techniques, we further illustrate that the eight Thunnus species are genetically distinct, but found evidence of mitochondrial genome introgression across two species. Phylogeny-based metrics highlight conservation needs for some of these species.
Collapse
Affiliation(s)
- Adam G Ciezarek
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, United Kingdom
| | - Owen G Osborne
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, United Kingdom
| | - Oliver N Shipley
- Shark Research and Conservation Program, The Cape Eleuthera Institute, Rock Sound, Eleuthera, The Bahamas
- School of Marine and Atmospheric Science, Stony Brook University, Stony Brook, NY
| | - Edward J Brooks
- Shark Research and Conservation Program, The Cape Eleuthera Institute, Rock Sound, Eleuthera, The Bahamas
| | - Sean R Tracey
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, TAS, Australia
| | - Jaime D McAllister
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, TAS, Australia
| | - Luke D Gardner
- Department of Biology, Hopkins Marine Station, Stanford University, Pacific Grove, CA
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Kensington, London, United Kingdom
| | - Barbara Block
- Department of Biology, Hopkins Marine Station, Stanford University, Pacific Grove, CA
| | - Vincent Savolainen
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, United Kingdom
- Corresponding author: E-mail:
| |
Collapse
|
15
|
Lenhard B, Sternberg MJE. Computation Resources for Molecular Biology: Special Issue 2019. J Mol Biol 2019; 431:2395-2397. [PMID: 31152744 DOI: 10.1016/j.jmb.2019.05.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK; Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London, W12 0NN, UK.
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
16
|
Ofoegbu TC, David A, Kelley LA, Mezulis S, Islam SA, Mersmann SF, Strömich L, Vakser IA, Houlston RS, Sternberg MJE. PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants. J Mol Biol 2019; 431:2460-2466. [PMID: 31075275 PMCID: PMC6597944 DOI: 10.1016/j.jmb.2019.04.043] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 04/02/2019] [Accepted: 04/29/2019] [Indexed: 12/12/2022]
Abstract
PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes. Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest. PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk.
Collapse
Affiliation(s)
- Tochukwu C Ofoegbu
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Stefans Mezulis
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Suhail A Islam
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sophia F Mersmann
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Léonie Strömich
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS 66045, USA
| | - Richard S Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
17
|
Sternberg MJE, Yosef N. Computation Resources for Molecular Biology: Special Issue 2018. J Mol Biol 2018; 430:2181-2183. [PMID: 29860026 DOI: 10.1016/j.jmb.2018.05.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, South Kensington, London SW7 2AZ, UK.
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, CA 94720, USA; Chan Zuckerberg Biohub Investigator.
| |
Collapse
|
18
|
Reynolds CR, Islam SA, Sternberg MJE. EzMol: A Web Server Wizard for the Rapid Visualization and Image Production of Protein and Nucleic Acid Structures. J Mol Biol 2018; 430:2244-2248. [PMID: 29391170 PMCID: PMC5961936 DOI: 10.1016/j.jmb.2018.01.013] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 01/22/2018] [Accepted: 01/22/2018] [Indexed: 11/18/2022]
Abstract
EzMol is a molecular visualization Web server in the form of a software wizard, located at http://www.sbg.bio.ic.ac.uk/ezmol/. It is designed for easy and rapid image manipulation and display of protein molecules, and is intended for users who need to quickly produce high-resolution images of protein molecules but do not have the time or inclination to use a software molecular visualization system. EzMol allows the upload of molecular structure files in PDB format to generate a Web page including a representation of the structure that the user can manipulate. EzMol provides intuitive options for chain display, adjusting the color/transparency of residues, side chains and protein surfaces, and for adding labels to residues. The final adjusted protein image can then be downloaded as a high-resolution image. There are a range of applications for rapid protein display, including the illustration of specific areas of a protein structure and the rapid prototyping of images. We describe EzMol, a new Web server for the rapid visualization of protein structure. A wizard interface leads the user step-by-step through a focused set of options. Options are based around the most common requirements for molecular visualization. EzMol does not require any software to be downloaded and is cross-browser compatible. EzMol is designed for occasional users and does not require commands to be memorized.
Collapse
Affiliation(s)
- Christopher R Reynolds
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Kensington, London SW7 2AZ, UK.
| | - Suhail A Islam
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Kensington, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Kensington, London SW7 2AZ, UK
| |
Collapse
|
19
|
Alhuzimi E, Leal LG, Sternberg MJE, David A. Properties of human genes guided by their enrichment in rare and common variants. Hum Mutat 2017; 39:365-370. [PMID: 29197136 PMCID: PMC5838408 DOI: 10.1002/humu.23377] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 11/20/2017] [Accepted: 11/26/2017] [Indexed: 01/01/2023]
Abstract
We analyzed 563,099 common (minor allele frequency, MAF≥0.01) and rare (MAF < 0.01) genetic variants annotated in ExAC and UniProt and 26,884 disease‐causing variants from ClinVar and UniProt occurring in the coding region of 17,975 human protein‐coding genes. Three novel sets of genes were identified: those enriched in rare variants (n = 32 genes), in common variants (n = 282 genes), and in disease‐causing variants (n = 800 genes). Genes enriched in rare variants have far greater similarities in terms of biological and network properties to genes enriched in disease‐causing variants, than to genes enriched in common variants. However, in half of the genes enriched in rare variants (AOC2, MAMDC4, ANKHD1, CDC42BPB, SPAG5, TRRAP, TANC2, IQCH, USP54, SRRM2, DOPEY2, and PITPNM1), no disease‐causing variants have been identified in major, publicly available databases. Thus, genetic variants in these genes are strong candidates for disease and their identification, as part of sequencing studies, should prompt further in vitro analyses.
Collapse
Affiliation(s)
- Eman Alhuzimi
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Luis G Leal
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Alessia David
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
20
|
Bryant WA, Stentz R, Le Gall G, Sternberg MJE, Carding SR, Wilhelm T. In Silico Analysis of the Small Molecule Content of Outer Membrane Vesicles Produced by Bacteroides thetaiotaomicron Indicates an Extensive Metabolic Link between Microbe and Host. Front Microbiol 2017; 8:2440. [PMID: 29276507 PMCID: PMC5727896 DOI: 10.3389/fmicb.2017.02440] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 11/24/2017] [Indexed: 11/15/2022] Open
Abstract
The interactions between the gut microbiota and its host are of central importance to the health of the host. Outer membrane vesicles (OMVs) are produced ubiquitously by Gram-negative bacteria including the gut commensal Bacteroides thetaiotaomicron. These vesicles can interact with the host in various ways but until now their complement of small molecules has not been investigated in this context. Using an untargeted high-coverage metabolomic approach we have measured the small molecule content of these vesicles in contrasting in vitro conditions to establish what role these metabolites could perform when packed into these vesicles. B. thetaiotaomicron packs OMVs with a highly conserved core set of small molecules which are strikingly enriched with mouse-digestible metabolites and with metabolites previously shown to be associated with colonization of the murine GIT. By use of an expanded genome-scale metabolic model of B. thetaiotaomicron and a potential host (the mouse) we have established many possible metabolic pathways between the two organisms that were previously unknown, and have found several putative novel metabolic functions for mouse that are supported by gene annotations, but that do not currently appear in existing mouse metabolic networks. The lipidome of these OMVs bears no relation to the mouse lipidome, so the purpose of this particular composition of lipids remains unclear. We conclude from this analysis that through intimate symbiotic evolution OMVs produced by B. thetaiotaomicron are likely to have been adopted as a conduit for small molecules bound for the mammalian host in vivo.
Collapse
Affiliation(s)
- William A. Bryant
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Régis Stentz
- Gut Health and Food Safety Programme, Quadram Institute Bioscience, Norwich, United Kingdom
| | - Gwenaelle Le Gall
- Metabolomics Unit, Quadram Institute Bioscience, Norwich, United Kingdom
| | - Michael J. E. Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Simon R. Carding
- Gut Health and Food Safety Programme, Quadram Institute Bioscience, Norwich, United Kingdom
- Norwich Medical School, University of East Anglia, Norwich, United Kingdom
| | - Thomas Wilhelm
- Theoretical Systems Biology Lab, Quadram Institute Bioscience, Norwich, United Kingdom
| |
Collapse
|
21
|
Sundriyal S, Moniot S, Mahmud Z, Yao S, Di Fruscia P, Reynolds CR, Dexter DT, Sternberg MJE, Lam EWF, Steegborn C, Fuchter MJ. Thienopyrimidinone Based Sirtuin-2 (SIRT2)-Selective Inhibitors Bind in the Ligand Induced Selectivity Pocket. J Med Chem 2017; 60:1928-1945. [PMID: 28135086 PMCID: PMC6014686 DOI: 10.1021/acs.jmedchem.6b01690] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Indexed: 02/06/2023]
Abstract
Sirtuins (SIRTs) are NAD-dependent deacylases, known to be involved in a variety of pathophysiological processes and thus remain promising therapeutic targets for further validation. Previously, we reported a novel thienopyrimidinone SIRT2 inhibitor with good potency and excellent selectivity for SIRT2. Herein, we report an extensive SAR study of this chemical series and identify the key pharmacophoric elements and physiochemical properties that underpin the excellent activity observed. New analogues have been identified with submicromolar SIRT2 inhibtory activity and good to excellent SIRT2 subtype-selectivity. Importantly, we report a cocrystal structure of one of our compounds (29c) bound to SIRT2. This reveals our series to induce the formation of a previously reported selectivity pocket but to bind in an inverted fashion to what might be intuitively expected. We believe these findings will contribute significantly to an understanding of the mechanism of action of SIRT2 inhibitors and to the identification of refined, second generation inhibitors.
Collapse
Affiliation(s)
- Sandeep Sundriyal
- Department of Chemistry, Imperial College London, London SW7 2AZ, U.K.
| | - Sébastien Moniot
- Department of Biochemistry, University
of Bayreuth, Universitaetsstrasse 30, 95447 Bayreuth, Germany
| | - Zimam Mahmud
- Department of Surgery & Cancer, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, U.K.
| | - Shang Yao
- Department of Surgery & Cancer, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, U.K.
| | - Paolo Di Fruscia
- Department of Chemistry, Imperial College London, London SW7 2AZ, U.K.
| | | | - David T. Dexter
- Centre for Neuroinflammation & Neurodegeneration,
Division of Brain Sciences, Imperial College
London, London W12 0NN, U.K.
| | | | - Eric W.-F. Lam
- Department of Surgery & Cancer, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, U.K.
| | - Clemens Steegborn
- Department of Biochemistry, University
of Bayreuth, Universitaetsstrasse 30, 95447 Bayreuth, Germany
| | | |
Collapse
|
22
|
Greener JG, Filippis I, Sternberg MJE. Predicting Protein Dynamics and Allostery Using Multi-Protein Atomic Distance Constraints. Structure 2017; 25:546-558. [PMID: 28190781 PMCID: PMC5343748 DOI: 10.1016/j.str.2017.01.008] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 11/24/2016] [Accepted: 01/19/2017] [Indexed: 11/16/2022]
Abstract
The related concepts of protein dynamics, conformational ensembles and allostery are often difficult to study with molecular dynamics (MD) due to the timescales involved. We present ExProSE (Exploration of Protein Structural Ensembles), a distance geometry-based method that generates an ensemble of protein structures from two input structures. ExProSE provides a unified framework for the exploration of protein structure and dynamics in a fast and accessible way. Using a dataset of apo/holo pairs it is shown that existing coarse-grained methods often cannot span large conformational changes. For T4-lysozyme, ExProSE is able to generate ensembles that are more native-like than tCONCOORD and NMSim, and comparable with targeted MD. By adding additional constraints representing potential modulators, ExProSE can predict allosteric sites. ExProSE ranks an allosteric pocket first or second for 27 out of 58 allosteric proteins, which is similar and complementary to existing methods. The ExProSE source code is freely available.
Collapse
Affiliation(s)
- Joe G Greener
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| | - Ioannis Filippis
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
23
|
Ittisoponpisan S, Alhuzimi E, Sternberg MJE, David A. Landscape of Pleiotropic Proteins Causing Human Disease: Structural and System Biology Insights. Hum Mutat 2017; 38:289-296. [PMID: 27957775 DOI: 10.1002/humu.23155] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 12/03/2016] [Indexed: 12/13/2022]
Abstract
Pleiotropy is the phenomenon by which the same gene can result in multiple phenotypes. Pleiotropic proteins are emerging as important contributors to rare and common disorders. Nevertheless, little is known on the mechanisms underlying pleiotropy and the characteristic of pleiotropic proteins. We analyzed disease-causing proteins reported in UniProt and observed that 12% are pleiotropic (variants in the same protein cause more than one disease). Pleiotropic proteins were enriched in deleterious and rare variants, but not in common variants. Pleiotropic proteins were more likely to be involved in the pathogenesis of neoplasms, neurological, and circulatory diseases and congenital malformations, whereas non-pleiotropic proteins in endocrine and metabolic disorders. Pleiotropic proteins were more essential and had a higher number of interacting partners compared with non-pleiotropic proteins. Significantly more pleiotropic than non-pleiotropic proteins contained at least one intrinsically long disordered region (P < 0.001). Deleterious variants occurring in structurally disordered regions were more commonly found in pleiotropic, rather than non-pleiotropic proteins. In conclusion, pleiotropic proteins are an important contributor to human disease. They represent a biologically different class of proteins compared with non-pleiotropic proteins and a better understanding of their characteristics and genetic variants can greatly aid in the interpretation of genetic studies and drug design.
Collapse
Affiliation(s)
- Sirawit Ittisoponpisan
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| | - Eman Alhuzimi
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| | - Alessia David
- Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, UK
| |
Collapse
|
24
|
Ostankovitch MI, Sternberg MJE. Computation Resources for Molecular Biology: Special Issue 2017. J Mol Biol 2016; 429:345-347. [PMID: 28025039 DOI: 10.1016/j.jmb.2016.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, South Kensington, London SW7 2AZ, UK.
| |
Collapse
|
25
|
Jiang Y, Oron TR, Clark WT, Bankapur AR, D'Andrea D, Lepore R, Funk CS, Kahanda I, Verspoor KM, Ben-Hur A, Koo DCE, Penfold-Brown D, Shasha D, Youngs N, Bonneau R, Lin A, Sahraeian SME, Martelli PL, Profiti G, Casadio R, Cao R, Zhong Z, Cheng J, Altenhoff A, Skunca N, Dessimoz C, Dogan T, Hakala K, Kaewphan S, Mehryary F, Salakoski T, Ginter F, Fang H, Smithers B, Oates M, Gough J, Törönen P, Koskinen P, Holm L, Chen CT, Hsu WL, Bryson K, Cozzetto D, Minneci F, Jones DT, Chapman S, Bkc D, Khan IK, Kihara D, Ofer D, Rappoport N, Stern A, Cibrian-Uhalte E, Denny P, Foulger RE, Hieta R, Legge D, Lovering RC, Magrane M, Melidoni AN, Mutowo-Meullenet P, Pichler K, Shypitsyna A, Li B, Zakeri P, ElShal S, Tranchevent LC, Das S, Dawson NL, Lee D, Lees JG, Sillitoe I, Bhat P, Nepusz T, Romero AE, Sasidharan R, Yang H, Paccanaro A, Gillis J, Sedeño-Cortés AE, Pavlidis P, Feng S, Cejuela JM, Goldberg T, Hamp T, Richter L, Salamov A, Gabaldon T, Marcet-Houben M, Supek F, Gong Q, Ning W, Zhou Y, Tian W, Falda M, Fontana P, Lavezzo E, Toppo S, Ferrari C, Giollo M, Piovesan D, Tosatto SCE, Del Pozo A, Fernández JM, Maietta P, Valencia A, Tress ML, Benso A, Di Carlo S, Politano G, Savino A, Rehman HU, Re M, Mesiti M, Valentini G, Bargsten JW, van Dijk ADJ, Gemovic B, Glisic S, Perovic V, Veljkovic V, Veljkovic N, Almeida-E-Silva DC, Vencio RZN, Sharan M, Vogel J, Kansakar L, Zhang S, Vucetic S, Wang Z, Sternberg MJE, Wass MN, Huntley RP, Martin MJ, O'Donovan C, Robinson PN, Moreau Y, Tramontano A, Babbitt PC, Brenner SE, Linial M, Orengo CA, Rost B, Greene CS, Mooney SD, Friedberg I, Radivojac P. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol 2016; 17:184. [PMID: 27604469 PMCID: PMC5015320 DOI: 10.1186/s13059-016-1037-6] [Citation(s) in RCA: 252] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 08/04/2016] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Collapse
Affiliation(s)
- Yuxiang Jiang
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | | | - Wyatt T Clark
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Asma R Bankapur
- Department of Microbiology, Miami University, Oxford, OH, USA
| | | | | | - Christopher S Funk
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA
| | - Indika Kahanda
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | - Karin M Verspoor
- Department of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia
- Health and Biomedical Informatics Centre, University of Melbourne, Parkville, Victoria, Australia
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | | | - Duncan Penfold-Brown
- Social Media and Political Participation Lab, New York University, New York, NY, USA
- CY Data Science, New York, NY, USA
| | - Dennis Shasha
- Department of Computer Science, New York University, New York, NY, USA
| | - Noah Youngs
- CY Data Science, New York, NY, USA
- Department of Computer Science, New York University, New York, NY, USA
- Simons Center for Data Analysis, New York, NY, USA
| | - Richard Bonneau
- Department of Computer Science, New York University, New York, NY, USA
- Simons Center for Data Analysis, New York, NY, USA
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA
| | - Alexandra Lin
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Sayed M E Sahraeian
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | | | - Giuseppe Profiti
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy
| | - Renzhi Cao
- Computer Science Department, University of Missouri, Columbia, MO, USA
| | - Zhaolong Zhong
- Computer Science Department, University of Missouri, Columbia, MO, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, MO, USA
| | - Adrian Altenhoff
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Nives Skunca
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Christophe Dessimoz
- Bioinformatics Group, Department of Computer Science, University College London, London, UK
- University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tunca Dogan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Kai Hakala
- Department of Information Technology, University of Turku, Turku, Finland
- University of Turku Graduate School, University of Turku, Turku, Finland
| | - Suwisa Kaewphan
- Department of Information Technology, University of Turku, Turku, Finland
- University of Turku Graduate School, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Farrokh Mehryary
- Department of Information Technology, University of Turku, Turku, Finland
- University of Turku Graduate School, University of Turku, Turku, Finland
| | - Tapio Salakoski
- Department of Information Technology, University of Turku, Turku, Finland
- Turku Centre for Computer Science, Turku, Finland
| | - Filip Ginter
- Department of Information Technology, University of Turku, Turku, Finland
| | - Hai Fang
- University of Bristol, Bristol, UK
| | | | | | | | - Petri Törönen
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Patrik Koskinen
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Liisa Holm
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
- Department of Biological and Environmental Sciences, Universitity of Helsinki, Helsinki, Finland
| | - Ching-Tai Chen
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Kevin Bryson
- Bioinformatics Group, Department of Computer Science, University College London, London, UK
| | - Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, London, UK
| | - Federico Minneci
- Bioinformatics Group, Department of Computer Science, University College London, London, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, London, UK
| | - Samuel Chapman
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka Bkc
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Ishita K Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Dan Ofer
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Nadav Rappoport
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Amos Stern
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Elena Cibrian-Uhalte
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Paul Denny
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK
| | - Rebecca E Foulger
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK
| | - Reija Hieta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Duncan Legge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Ruth C Lovering
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK
| | - Michele Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Anna N Melidoni
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK
| | | | - Klemens Pichler
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Aleksandra Shypitsyna
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Biao Li
- Buck Institute for Research on Aging, Novato, CA, USA
| | - Pooya Zakeri
- Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
- iMinds Department Medical Information Technologies, Leuven, Belgium
| | - Sarah ElShal
- Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium
- iMinds Department Medical Information Technologies, Leuven, Belgium
| | - Léon-Charles Tranchevent
- Inserm UMR-S1052, CNRS UMR5286, Cancer Research Centre of Lyon, Lyon, France
- Université de Lyon 1, Villeurbanne, France
- Centre Léon Bérard, Lyon, France
| | - Sayoni Das
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - David Lee
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | | | | | - Alfonso E Romero
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Rajkumar Sasidharan
- Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA, USA
| | - Haixuan Yang
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Ireland
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics Cold Spring Harbor Laboratory, New York, NY, USA
| | | | - Paul Pavlidis
- Department of Psychiatry and Michael Smith Laboratories, University of British Columbia, Vancouver, Canada
| | - Shou Feng
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | - Juan M Cejuela
- Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany
| | - Tatyana Goldberg
- Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany
| | - Tobias Hamp
- Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany
| | - Lothar Richter
- Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany
| | - Asaf Salamov
- DOE Joint Genome Institute, Walnut Creek, CA, USA
| | - Toni Gabaldon
- Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | - Marina Marcet-Houben
- Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Fran Supek
- Universitat Pompeu Fabra, Barcelona, Spain
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
| | - Qingtian Gong
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China
- Children's Hospital of Fudan University, Shanghai, China
| | - Wei Ning
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China
- Children's Hospital of Fudan University, Shanghai, China
| | - Yuanpeng Zhou
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China
- Children's Hospital of Fudan University, Shanghai, China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China
- Children's Hospital of Fudan University, Shanghai, China
| | - Marco Falda
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Paolo Fontana
- Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy
| | - Enrico Lavezzo
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Carlo Ferrari
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Manuel Giollo
- Department of Information Engineering, University of Padua, Padova, Italy
- Department of Biomedical Sciences, University of Padua, Padova, Italy
| | - Damiano Piovesan
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Silvio C E Tosatto
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Angela Del Pozo
- Instituto De Genetica Medica y Molecular, Hospital Universitario de La Paz, Madrid, Spain
| | - José M Fernández
- Spanish National Bioinformatics Institute, Spanish National Cancer Research Institute, Madrid, Spain
| | - Paolo Maietta
- Structural and Computational Biology Programme, Spanish National Cancer Research Institute, Madrid, Spain
| | - Alfonso Valencia
- Structural and Computational Biology Programme, Spanish National Cancer Research Institute, Madrid, Spain
| | - Michael L Tress
- Structural and Computational Biology Programme, Spanish National Cancer Research Institute, Madrid, Spain
| | - Alfredo Benso
- Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy
| | - Stefano Di Carlo
- Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy
| | - Gianfranco Politano
- Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy
| | - Alessandro Savino
- Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy
| | - Hafeez Ur Rehman
- National University of Computer & Emerging Sciences, Islamabad, Pakistan
| | - Matteo Re
- Anacleto Lab, Dipartimento di informatica, Università degli Studi di Milano, Milan, Italy
| | - Marco Mesiti
- Anacleto Lab, Dipartimento di informatica, Università degli Studi di Milano, Milan, Italy
| | - Giorgio Valentini
- Anacleto Lab, Dipartimento di informatica, Università degli Studi di Milano, Milan, Italy
| | - Joachim W Bargsten
- Applied Bioinformatics, Bioscience, Wageningen University and Research Centre, Wageningen, Netherlands
| | - Aalt D J van Dijk
- Applied Bioinformatics, Bioscience, Wageningen University and Research Centre, Wageningen, Netherlands
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Branislava Gemovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Sanja Glisic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Vladmir Perovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Veljko Veljkovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Nevena Veljkovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | | | - Ricardo Z N Vencio
- Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirao Preto, Brazil
| | - Malvika Sharan
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany
| | - Jörg Vogel
- Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany
| | - Lakesh Kansakar
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Shanshan Zhang
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Slobodan Vucetic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Zheng Wang
- University of Southern Mississippi, Hattiesburg, MS, USA
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | - Mark N Wass
- School of Biosciences, University of Kent, Canterbury, Kent, UK
| | - Rachael P Huntley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Peter N Robinson
- Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Yves Moreau
- Department of Electrical Engineering ESAT-SCD and IBBT-KU Leuven Future Health Department, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | - Patricia C Babbitt
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, CA, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Michal Linial
- Department of Chemical Biology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Burkhard Rost
- Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, OH, USA.
- Department of Computer Science, Miami University, Oxford, OH, USA.
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
26
|
Mezulis S, Sternberg MJE, Kelley LA. PhyreStorm: A Web Server for Fast Structural Searches Against the PDB. J Mol Biol 2015; 428:702-708. [PMID: 26517951 PMCID: PMC7610957 DOI: 10.1016/j.jmb.2015.10.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 10/13/2015] [Accepted: 10/18/2015] [Indexed: 11/10/2022]
Abstract
The identification of structurally similar proteins can provide a range of biological insights, and accordingly, the alignment of a query protein to a database of experimentally determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm Web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93 ± 2% of all highly similar (TM-score > 0.7) structures in the PDB for each query structure, usually in less than 60 s. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/.
Collapse
Affiliation(s)
- Stefans Mezulis
- Structural Bioinformatics Group, Imperial College London, London SW7 2AZ, United Kingdom.
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Imperial College London, London SW7 2AZ, United Kingdom
| | - Lawrence A Kelley
- Structural Bioinformatics Group, Imperial College London, London SW7 2AZ, United Kingdom
| |
Collapse
|
27
|
Greener JG, Sternberg MJE. AlloPred: prediction of allosteric pockets on proteins using normal mode perturbation analysis. BMC Bioinformatics 2015; 16:335. [PMID: 26493317 PMCID: PMC4619270 DOI: 10.1186/s12859-015-0771-1] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2015] [Accepted: 10/13/2015] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Despite being hugely important in biological processes, allostery is poorly understood and no universal mechanism has been discovered. Allosteric drugs are a largely unexplored prospect with many potential advantages over orthosteric drugs. Computational methods to predict allosteric sites on proteins are needed to aid the discovery of allosteric drugs, as well as to advance our fundamental understanding of allostery. RESULTS AlloPred, a novel method to predict allosteric pockets on proteins, was developed. AlloPred uses perturbation of normal modes alongside pocket descriptors in a machine learning approach that ranks the pockets on a protein. AlloPred ranked an allosteric pocket top for 23 out of 40 known allosteric proteins, showing comparable and complementary performance to two existing methods. In 28 of 40 cases an allosteric pocket was ranked first or second. The AlloPred web server, freely available at http://www.sbg.bio.ic.ac.uk/allopred/home, allows visualisation and analysis of predictions. The source code and dataset information are also available from this site. CONCLUSIONS Perturbation of normal modes can enhance our ability to predict allosteric sites on proteins. Computational methods such as AlloPred assist drug discovery efforts by suggesting sites on proteins for further experimental study.
Collapse
Affiliation(s)
- Joe G Greener
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
28
|
Reynolds CR, Muggleton SH, Sternberg MJE. Incorporating Virtual Reactions into a Logic-based Ligand-based Virtual Screening Method to Discover New Leads. Mol Inform 2015; 34:615-625. [PMID: 26583052 PMCID: PMC4641463 DOI: 10.1002/minf.201400162] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 01/08/2015] [Indexed: 11/28/2022]
Abstract
The use of virtual screening has become increasingly central to the drug development pipeline, with ligand-based virtual screening used to screen databases of compounds to predict their bioactivity against a target. These databases can only represent a small fraction of chemical space, and this paper describes a method of exploring synthetic space by applying virtual reactions to promising compounds within a database, and generating focussed libraries of predicted derivatives. A ligand-based virtual screening tool Investigational Novel Drug Discovery by Example (INDDEx) is used as the basis for a system of virtual reactions. The use of virtual reactions is estimated to open up a potential space of 1.21×1012 potential molecules. A de novo design algorithm known as Partial Logical-Rule Reactant Selection (PLoRRS) is introduced and incorporated into the INDDEx methodology. PLoRRS uses logical rules from the INDDEx model to select reactants for the de novo generation of potentially active products. The PLoRRS method is found to increase significantly the likelihood of retrieving molecules similar to known actives with a p-value of 0.016. Case studies demonstrate that the virtual reactions produce molecules highly similar to known actives, including known blockbuster drugs.
Collapse
Affiliation(s)
- Christopher R Reynolds
- Department of Bioinformatics, Imperial College London, South Kensington CampusLondon SW7 2AZ, UK
| | - Stephen H Muggleton
- Department of Computing, Imperial College London, South Kensington CampusLondon SW7 2AZ, UK
| | - Michael J E Sternberg
- Department of Bioinformatics, Imperial College London, South Kensington CampusLondon SW7 2AZ, UK
| |
Collapse
|
29
|
Cornish AJ, Filippis I, David A, Sternberg MJE. Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types. Genome Med 2015; 7:95. [PMID: 26330083 PMCID: PMC4557825 DOI: 10.1186/s13073-015-0212-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 07/31/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Each cell type found within the human body performs a diverse and unique set of functions, the disruption of which can lead to disease. However, there currently exists no systematic mapping between cell types and the diseases they can cause. METHODS In this study, we integrate protein-protein interaction data with high-quality cell-type-specific gene expression data from the FANTOM5 project to build the largest collection of cell-type-specific interactomes created to date. We develop a novel method, called gene set compactness (GSC), that contrasts the relative positions of disease-associated genes across 73 cell-type-specific interactomes to map genes associated with 196 diseases to the cell types they affect. We conduct text-mining of the PubMed database to produce an independent resource of disease-associated cell types, which we use to validate our method. RESULTS The GSC method successfully identifies known disease-cell-type associations, as well as highlighting associations that warrant further study. This includes mast cells and multiple sclerosis, a cell population currently being targeted in a multiple sclerosis phase 2 clinical trial. Furthermore, we build a cell-type-based diseasome using the cell types identified as manifesting each disease, offering insight into diseases linked through etiology. CONCLUSIONS The data set produced in this study represents the first large-scale mapping of diseases to the cell types in which they are manifested and will therefore be useful in the study of disease systems. Overall, we demonstrate that our approach links disease-associated genes to the phenotypes they produce, a key goal within systems medicine.
Collapse
Affiliation(s)
- Alex J Cornish
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2AZ, UK.
| | - Ioannis Filippis
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2AZ, UK.
| | - Alessia David
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Department of Life Sciences, Imperial College London, Exhibition Road, London, SW7 2AZ, UK.
| |
Collapse
|
30
|
David A, Sternberg MJE. The Contribution of Missense Mutations in Core and Rim Residues of Protein-Protein Interfaces to Human Disease. J Mol Biol 2015; 427:2886-98. [PMID: 26173036 PMCID: PMC4548493 DOI: 10.1016/j.jmb.2015.07.004] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 06/19/2015] [Accepted: 07/06/2015] [Indexed: 01/21/2023]
Abstract
Missense mutations at protein–protein interaction sites, called interfaces, are important contributors to human disease. Interfaces are non-uniform surface areas characterized by two main regions, “core” and “rim”, which differ in terms of evolutionary conservation and physicochemical properties. Moreover, within interfaces, only a small subset of residues (“hot spots”) is crucial for the binding free energy of the protein–protein complex. We performed a large-scale structural analysis of human single amino acid variations (SAVs) and demonstrated that disease-causing mutations are preferentially located within the interface core, as opposed to the rim (p < 0.01). In contrast, the interface rim is significantly enriched in polymorphisms, similar to the remaining non-interacting surface. Energetic hot spots tend to be enriched in disease-causing mutations compared to non-hot spots (p = 0.05), regardless of their occurrence in core or rim residues. For individual amino acids, the frequency of substitution into a polymorphism or disease-causing mutation differed to other amino acids and was related to its structural location, as was the type of physicochemical change introduced by the SAV. In conclusion, this study demonstrated the different distribution and properties of disease-causing SAVs and polymorphisms within different structural regions and in relation to the energetic contribution of amino acid in protein–protein interfaces, thus highlighting the importance of a structural system biology approach for predicting the effect of SAVs. Protein–protein interactions are fundamental in all biological processes. The distribution of deleterious and non-SAVs within protein interfaces is unknown. The distribution of deleterious SAVs differs within different interface structural regions. The distribution of SAVs differs in relation to interface residues energetic contribution. Structural analysis of protein complexes enhances the understanding of deleterious SAVs.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, SW7 2AZ London, United Kingdom.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, SW7 2AZ London, United Kingdom.
| |
Collapse
|
31
|
Abstract
Protein domains are generally thought to correspond to units of evolution. New research raises questions about how such domains are defined with bioinformatics tools and sheds light on how evolution has enabled partial domains to be viable.
Collapse
Affiliation(s)
- Lawrence A Kelley
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
32
|
Abstract
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.
Collapse
Affiliation(s)
- Lawrence A Kelley
- Structural Bioinformatics Group, Imperial College London, London, UK
| | - Stefans Mezulis
- Structural Bioinformatics Group, Imperial College London, London, UK
| | | | - Mark N Wass
- Structural Bioinformatics Group, Imperial College London, London, UK
| | | |
Collapse
|
33
|
Irimia M, Weatheritt RJ, Ellis JD, Parikshak NN, Gonatopoulos-Pournatzis T, Babor M, Quesnel-Vallières M, Tapial J, Raj B, O'Hanlon D, Barrios-Rodiles M, Sternberg MJE, Cordes SP, Roth FP, Wrana JL, Geschwind DH, Blencowe BJ. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell 2015; 159:1511-23. [PMID: 25525873 DOI: 10.1016/j.cell.2014.11.035] [Citation(s) in RCA: 405] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 10/20/2014] [Accepted: 11/18/2014] [Indexed: 12/16/2022]
Abstract
Alternative splicing (AS) generates vast transcriptomic and proteomic complexity. However, which of the myriad of detected AS events provide important biological functions is not well understood. Here, we define the largest program of functionally coordinated, neural-regulated AS described to date in mammals. Relative to all other types of AS within this program, 3-15 nucleotide "microexons" display the most striking evolutionary conservation and switch-like regulation. These microexons modulate the function of interaction domains of proteins involved in neurogenesis. Most neural microexons are regulated by the neuronal-specific splicing factor nSR100/SRRM4, through its binding to adjacent intronic enhancer motifs. Neural microexons are frequently misregulated in the brains of individuals with autism spectrum disorder, and this misregulation is associated with reduced levels of nSR100. The results thus reveal a highly conserved program of dynamic microexon regulation associated with the remodeling of protein-interaction networks during neurogenesis, the misregulation of which is linked to autism.
Collapse
Affiliation(s)
- Manuel Irimia
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona 08003, Spain.
| | - Robert J Weatheritt
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Jonathan D Ellis
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Neelroop N Parikshak
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | | | - Mariana Babor
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | | | - Javier Tapial
- EMBL/CRG Research Unit in Systems Biology, Centre for Genomic Regulation (CRG), 88 Dr. Aiguader, Barcelona 08003, Spain
| | - Bushra Raj
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Dave O'Hanlon
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Miriam Barrios-Rodiles
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Sabine P Cordes
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, ON M5S 3G4, Canada; Canadian Institute For Advanced Research, 180 Dundas Street West, Toronto, ON M5G 1Z8, Canada
| | - Jeffrey L Wrana
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, ON M5G 1X5, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada
| | - Daniel H Geschwind
- Department of Neurology, Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, 695 Charles E. Young Drive South, Los Angeles, CA 90095, USA
| | - Benjamin J Blencowe
- Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Toronto, ON M5S 1A8, Canada.
| |
Collapse
|
34
|
Di Fruscia P, Zacharioudakis E, Liu C, Moniot S, Laohasinnarong S, Khongkow M, Harrison IF, Koltsida K, Reynolds CR, Schmidtkunz K, Jung M, Chapman KL, Steegborn C, Dexter DT, Sternberg MJE, Lam EWF, Fuchter MJ. The discovery of a highly selective 5,6,7,8-tetrahydrobenzo[4,5]thieno[2,3-d]pyrimidin-4(3H)-one SIRT2 inhibitor that is neuroprotective in an in vitro Parkinson's disease model. ChemMedChem 2014; 10:69-82. [PMID: 25395356 DOI: 10.1002/cmdc.201402431] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Indexed: 02/03/2023]
Abstract
Sirtuins, NAD(+) -dependent histone deacetylases (HDACs), have recently emerged as potential therapeutic targets for the treatment of a variety of diseases. The discovery of potent and isoform-selective inhibitors of this enzyme family should provide chemical tools to help determine the roles of these targets and validate their therapeutic value. Herein, we report the discovery of a novel class of highly selective SIRT2 inhibitors, identified by pharmacophore screening. We report the identification and validation of 3-((2-methoxynaphthalen-1-yl)methyl)-7-((pyridin-3-ylmethyl)amino)-5,6,7,8-tetrahydrobenzo[4,5]thieno[2,3-d]pyrimidin-4(3H)-one (ICL-SIRT078), a substrate-competitive SIRT2 inhibitor with a Ki value of 0.62 ± 0.15 μM and more than 50-fold selectivity against SIRT1, 3 and 5. Treatment of MCF-7 breast cancer cells with ICL-SIRT078 results in hyperacetylation of α-tubulin, an established SIRT2 biomarker, at doses comparable with the biochemical IC50 data, while suppressing MCF-7 proliferation at higher concentrations. In concordance with the recent reports that suggest SIRT2 inhibition is a potential strategy for the treatment of Parkinson's disease, we find that compound ICL-SIRT078 has a significant neuroprotective effect in a lactacystin-induced model of Parkinsonian neuronal cell death in the N27 cell line. These results encourage further investigation into the effects of ICL-SIRT078, or an optimised derivative thereof, as a candidate neuroprotective agent in in vivo models of Parkinson's disease.
Collapse
Affiliation(s)
- Paolo Di Fruscia
- Department of Chemistry, Imperial College London, St. Kensington Campus, London SW7 2AZ, (UK)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Lewis TE, Sillitoe I, Andreeva A, Blundell TL, Buchan DWA, Chothia C, Cozzetto D, Dana JM, Filippis I, Gough J, Jones DT, Kelley LA, Kleywegt GJ, Minneci F, Mistry J, Murzin AG, Ochoa-Montaño B, Oates ME, Punta M, Rackham OJL, Stahlhacke J, Sternberg MJE, Velankar S, Orengo C. Genome3D: exploiting structure to help users understand their sequences. Nucleic Acids Res 2014; 43:D382-6. [PMID: 25348407 PMCID: PMC4384030 DOI: 10.1093/nar/gku973] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.
Collapse
Affiliation(s)
- Tony E Lewis
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Antonina Andreeva
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Daniel W A Buchan
- Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - Cyrus Chothia
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK
| | - Domenico Cozzetto
- Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - José M Dana
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ioannis Filippis
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Julian Gough
- Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - David T Jones
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - Lawrence A Kelley
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Gerard J Kleywegt
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Federico Minneci
- Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - Jaina Mistry
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Alexey G Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK
| | - Bernardo Ochoa-Montaño
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Matt E Oates
- Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Marco Punta
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Owen J L Rackham
- MRC Clinical Sciences Centre, Hammersmith Hospital Campus, Du Cane Road, London, W12 0NN, UK
| | - Jonathan Stahlhacke
- Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sameer Velankar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
36
|
Talman AM, Prieto JH, Marques S, Ubaida-Mohien C, Lawniczak M, Wass MN, Xu T, Frank R, Ecker A, Stanway RS, Krishna S, Sternberg MJE, Christophides GK, Graham DR, Dinglasan RR, Yates JR, Sinden RE. Proteomic analysis of the Plasmodium male gamete reveals the key role for glycolysis in flagellar motility. Malar J 2014; 13:315. [PMID: 25124718 PMCID: PMC4150949 DOI: 10.1186/1475-2875-13-315] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2014] [Accepted: 07/28/2014] [Indexed: 12/22/2022] Open
Abstract
Background Gametogenesis and fertilization play crucial roles in malaria transmission. While male gametes are thought to be amongst the simplest eukaryotic cells and are proven targets of transmission blocking immunity, little is known about their molecular organization. For example, the pathway of energy metabolism that power motility, a feature that facilitates gamete encounter and fertilization, is unknown. Methods Plasmodium berghei microgametes were purified and analysed by whole-cell proteomic analysis for the first time. Data are available via ProteomeXchange with identifier PXD001163. Results 615 proteins were recovered, they included all male gamete proteins described thus far. Amongst them were the 11 enzymes of the glycolytic pathway. The hexose transporter was localized to the gamete plasma membrane and it was shown that microgamete motility can be suppressed effectively by inhibitors of this transporter and of the glycolytic pathway. Conclusions This study describes the first whole-cell proteomic analysis of the malaria male gamete. It identifies glycolysis as the likely exclusive source of energy for flagellar beat, and provides new insights in original features of Plasmodium flagellar organization. Electronic supplementary material The online version of this article (doi:10.1186/1475-2875-13-315) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arthur M Talman
- Division of Cell and Molecular Biology, Imperial College, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Yates CM, Filippis I, Kelley LA, Sternberg MJE. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 2014; 426:2692-701. [PMID: 24810707 PMCID: PMC4087249 DOI: 10.1016/j.jmb.2014.04.026] [Citation(s) in RCA: 159] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 04/23/2014] [Accepted: 04/28/2014] [Indexed: 11/16/2022]
Abstract
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. Bioinformatics approaches are key for identification of disease-causing variants. SAV phenotype prediction can be improved using network information. A method including these features, SuSPect, outperforms tested methods. SuSPect is available to use at www.sbg.bio.ic.ac.uk/suspect.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.
| | - Ioannis Filippis
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
38
|
Yates CM, Sternberg MJE. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol 2013; 425:3949-63. [PMID: 23867278 DOI: 10.1016/j.jmb.2013.07.012] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 07/02/2013] [Accepted: 07/09/2013] [Indexed: 12/23/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Sir Ernst Chain Building, Imperial College London, South Kensington, SW7 2AZ, UK.
| | | |
Collapse
|
39
|
Bryant WA, Sternberg MJE, Pinney JW. AMBIENT: Active Modules for Bipartite Networks--using high-throughput transcriptomic data to dissect metabolic response. BMC Syst Biol 2013; 7:26. [PMID: 23531303 PMCID: PMC3656802 DOI: 10.1186/1752-0509-7-26] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 03/01/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND With the continued proliferation of high-throughput biological experiments, there is a pressing need for tools to integrate the data produced in ways that produce biologically meaningful conclusions. Many microarray studies have analysed transcriptomic data from a pathway perspective, for instance by testing for KEGG pathway enrichment in sets of upregulated genes. However, the increasing availability of species-specific metabolic models provides the opportunity to analyse these data in a more objective, system-wide manner. RESULTS Here we introduce ambient (Active Modules for Bipartite Networks), a simulated annealing approach to the discovery of metabolic subnetworks (modules) that are significantly affected by a given genetic or environmental change. The metabolic modules returned by ambient are connected parts of the bipartite network that change coherently between conditions, providing a more detailed view of metabolic changes than standard approaches based on pathway enrichment. CONCLUSIONS ambient is an effective and flexible tool for the analysis of high-throughput data in a metabolic context. The same approach can be applied to any system in which reactions (or metabolites) can be assigned a score based on some biological observation, without the limitation of predefined pathways. A Python implementation of ambient is available at http://www.theosysbio.bio.ic.ac.uk/ambient.
Collapse
Affiliation(s)
- William A Bryant
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, SW7 2AZ, UK.
| | | | | |
Collapse
|
40
|
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, Friedberg I. A large-scale evaluation of computational protein function prediction. Nat Methods 2013; 10:221-7. [PMID: 23353650 PMCID: PMC3584181 DOI: 10.1038/nmeth.2340] [Citation(s) in RCA: 564] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 12/10/2012] [Indexed: 01/03/2023]
Abstract
A report on the results of the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Collapse
Affiliation(s)
- Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Yates CM, Sternberg MJE. Proteins and domains vary in their tolerance of non-synonymous single nucleotide polymorphisms (nsSNPs). J Mol Biol 2013; 425:1274-86. [PMID: 23357174 DOI: 10.1016/j.jmb.2013.01.026] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Revised: 01/11/2013] [Accepted: 01/19/2013] [Indexed: 02/05/2023]
Abstract
The widespread application of whole-genome sequencing is identifying numerous non-synonymous single nucleotide polymorphisms (nsSNPs), many of which are associated with disease. We analyzed nsSNPs from Humsavar and the 1000 Genomes Project to investigate why some proteins and domains are more tolerant of mutations than others. We identified 311 proteins and 112 Pfam families, corresponding to 2910 domains, as diseasesusceptible and 32 proteins and 67 Pfam families (10,783 domains) as diseaseresistant based on the relative numbers of disease-associated and neutral polymorphisms. Proteins with no significant difference from expected numbers of disease and polymorphism nsSNPs are classified as other. This classification takes into account the phenotypes of all known mutations in the protein or domain rather than simply classifying based on the presence or absence of disease nsSNPs. Of the two hypotheses suggested, our results support the model that disease-resistant domains and proteins are more able to tolerate mutations rather than having more lethal mutations that are not observed. Disease-resistant proteins and domains show significantly higher mutation rates and lower sequence conservation than disease-susceptible proteins and domains. Disease-susceptible proteins are more likely to be encoded by essential genes, are more central in protein-protein interaction networks and are less likely to contain loss-of-function mutations in healthy individuals. We use this classification for nsSNP phenotype prediction, predicting nsSNPs in disease-susceptible domains to be disease and those in disease-resistant domains to be polymorphism. In this way, we achieve higher accuracy than SIFT, a state-of-the-art algorithm.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, Sir Ernst Chain Building, South Kensington, London SW7 2AZ, UK.
| | | |
Collapse
|
42
|
Abstract
An 'intrinsically disordered protein' (IDP) is assumed to be unfolded in the cell and perform its biological function in that state. We contend that most intrinsically disordered proteins are in fact proteins waiting for a partner (PWPs), parts of a multi-component complex that do not fold correctly in the absence of other components. Flexibility, not disorder, is an intrinsic property of proteins, exemplified by X-ray structures of many enzymes and protein-protein complexes. Disorder is often observed with purified proteins in vitro and sometimes also in crystals, where it is difficult to distinguish from flexibility. In the crowded environment of the cell, disorder is not compatible with the known mechanisms of protein-protein recognition, and, foremost, with its specificity. The self-assembly of multi-component complexes may, nevertheless, involve the specific recognition of nascent polypeptide chains that are incompletely folded, but then disorder is transient, and it must remain under the control of molecular chaperones and of the quality control apparatus that obviates the toxic effects it can have on the cell.
Collapse
Affiliation(s)
- Joël Janin
- Institut de Biochimie et Biophysique Moléculaire et Cellulaire, Université Paris-Sud 91405-Orsay, France
| | | |
Collapse
|
43
|
Abstract
The acid-labile subunit (ALS) is the main regulator of IGF1 and IGF2 bioavailability. ALS deficiency caused by mutations in the ALS (IGFALS) gene often results in mild short stature in adulthood. Little is known about the ALS structure-function relationship. A structural model built in 1999 suggested a doughnut shape, which has never been observed in the leucine-rich repeat (LRR) superfamily, to which ALS belongs. In this study, we built a new ALS structural model, analysed its glycosylation and charge distribution and studied mechanisms by which missense mutations affect protein structure. We used three structure prediction servers and integrated their results with information derived from ALS experimental studies. The ALS model was built at high confidence using Toll-like receptor protein templates and resembled a horseshoe with an extensively negatively charged concave surface. Enrichment in prolines and disulphide bonds was found at the ALS N- and C-termini. Moreover, seven N-glycosylation sites were identified and mapped. ALS mutations were predicted to affect protein structure by causing loss of hydrophobic interactions (p.Leu134Gln), alteration of the amino acid backbone (p.Leu241Pro, p.Leu172Phe and p.Leu244Phe), loss of disulphide bridges (p.Cys60Ser and p.Cys540Arg), change in structural constrains (p.Pro73Leu), creation of novel glycosylation sites (p.Asp440Asn) or alteration of LRRs (p.Asn276Ser). In conclusion, our ALS structural model was identified as a highly confident prediction by three independent methods and disagrees with the previously published ALS model. The new model allowed us to analyse the ALS core and its caps and to interpret the potential structural effects of ALS mutations.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Division of Molecular Biosciences, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| | | | | |
Collapse
|
44
|
Lewis TE, Sillitoe I, Andreeva A, Blundell TL, Buchan DW, Chothia C, Cuff A, Dana JM, Filippis I, Gough J, Hunter S, Jones DT, Kelley LA, Kleywegt GJ, Minneci F, Mitchell A, Murzin AG, Ochoa-Montaño B, Rackham OJL, Smith J, Sternberg MJE, Velankar S, Yeats C, Orengo C. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucleic Acids Res 2012. [PMID: 23203986 PMCID: PMC3531217 DOI: 10.1093/nar/gks1266] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker’s yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Collapse
Affiliation(s)
- Tony E. Lewis
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
- *To whom correspondence should be addressed. Tel: +44 2076 792171; Fax: +44 2076 797193;
| | - Antonina Andreeva
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Tom L. Blundell
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Daniel W.A. Buchan
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Cyrus Chothia
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Alison Cuff
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Jose M. Dana
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Ioannis Filippis
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Julian Gough
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Sarah Hunter
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - David T. Jones
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Lawrence A. Kelley
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Gerard J. Kleywegt
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Federico Minneci
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Alex Mitchell
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Alexey G. Murzin
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Bernardo Ochoa-Montaño
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Owen J. L. Rackham
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - James Smith
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Michael J. E. Sternberg
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Sameer Velankar
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Corin Yeats
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK, MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK, Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK, Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK and Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| |
Collapse
|
45
|
Sternberg MJE, Tamaddoni-Nezhad A, Lesk VI, Kay E, Hitchen PG, Cootes A, van Alphen LB, Lamoureux MP, Jarrell HC, Rawlings CJ, Soo EC, Szymanski CM, Dell A, Wren BW, Muggleton SH. Gene function hypotheses for the Campylobacter jejuni glycome generated by a logic-based approach. J Mol Biol 2012; 425:186-97. [PMID: 23103756 PMCID: PMC3546167 DOI: 10.1016/j.jmb.2012.10.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 10/15/2012] [Accepted: 10/17/2012] [Indexed: 11/26/2022]
Abstract
Increasingly, experimental data on biological systems are obtained from several sources and computational approaches are required to integrate this information and derive models for the function of the system. Here, we demonstrate the power of a logic-based machine learning approach to propose hypotheses for gene function integrating information from two diverse experimental approaches. Specifically, we use inductive logic programming that automatically proposes hypotheses explaining the empirical data with respect to logically encoded background knowledge. We study the capsular polysaccharide biosynthetic pathway of the major human gastrointestinal pathogen Campylobacter jejuni. We consider several key steps in the formation of capsular polysaccharide consisting of 15 genes of which 8 have assigned function, and we explore the extent to which functions can be hypothesised for the remaining 7. Two sources of experimental data provide the information for learning—the results of knockout experiments on the genes involved in capsule formation and the absence/presence of capsule genes in a multitude of strains of different serotypes. The machine learning uses the pathway structure as background knowledge. We propose assignments of specific genes to five previously unassigned reaction steps. For four of these steps, there was an unambiguous optimal assignment of gene to reaction, and to the fifth, there were three candidate genes. Several of these assignments were consistent with additional experimental results. We therefore show that the logic-based methodology provides a robust strategy to integrate results from different experimental approaches and propose hypotheses for the behaviour of a biological system.
Collapse
Affiliation(s)
- Michael J E Sternberg
- Centre for Integrative Systems Biology, Imperial College London, London SW7 2AZ, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
A Santos JC, Nassif H, Page D, Muggleton SH, E Sternberg MJ. Automated identification of protein-ligand interaction features using Inductive Logic Programming: a hexose binding case study. BMC Bioinformatics 2012; 13:162. [PMID: 22783946 PMCID: PMC3458898 DOI: 10.1186/1471-2105-13-162] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 06/15/2012] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. RESULTS The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. CONCLUSIONS In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.
Collapse
Affiliation(s)
- Jose C A Santos
- Computational Bioinformatics Laboratory, Department of Computer Science, Imperial College London, London, SW7 2BZ, UK
| | - Houssam Nassif
- Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI-53706, USA
| | - David Page
- Department of Computer Sciences, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI-53706, USA
| | - Stephen H Muggleton
- Computational Bioinformatics Laboratory, Department of Computer Science, Imperial College London, London, SW7 2BZ, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
47
|
Abstract
Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions. CombFunc incorporates ConFunc, our existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein–protein interaction data. In benchmarking on a set of 1686 proteins CombFunc obtains precision and recall of 0.71 and 0.64 respectively for gene ontology molecular function terms. For biological process GO terms precision of 0.74 and recall of 0.41 is obtained. CombFunc is available at http://www.sbg.bio.ic.ac.uk/combfunc.
Collapse
Affiliation(s)
- Mark N Wass
- Centre for Bioinformatics, Imperial College London, London, SW7 2AZ, UK.
| | | | | |
Collapse
|
48
|
Reynolds CR, Amini AC, Muggleton SH, Sternberg MJE. Assessment of a Rule-Based Virtual Screening Technology (INDDEx) on a Benchmark Data Set. J Phys Chem B 2012; 116:6732-9. [DOI: 10.1021/jp212084f] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Christopher R. Reynolds
- Department of Life Science, Imperial College London, London, SW7 2AZ United Kingdom
- Equinox Pharma Ltd., Incubator, Bessemer Building, Prince Consort Road, London, SW7 2BP United Kingdom
| | - Ata C. Amini
- Equinox Pharma Ltd., Incubator, Bessemer Building, Prince Consort Road, London, SW7 2BP United Kingdom
| | - Stephen H. Muggleton
- Department of Computing, Imperial College London, London, SW7 2BZ United Kingdom
- Equinox Pharma Ltd., Incubator, Bessemer Building, Prince Consort Road, London, SW7 2BP United Kingdom
| | - Michael J. E. Sternberg
- Department of Life Science, Imperial College London, London, SW7 2AZ United Kingdom
- Equinox Pharma Ltd., Incubator, Bessemer Building, Prince Consort Road, London, SW7 2BP United Kingdom
| |
Collapse
|
49
|
Phan HTT, Sternberg MJE. PINALOG: a novel approach to align protein interaction networks--implications for complex detection and function prediction. ACTA ACUST UNITED AC 2012; 28:1239-45. [PMID: 22419782 PMCID: PMC3338015 DOI: 10.1093/bioinformatics/bts119] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Motivation: Analysis of protein–protein interaction networks (PPINs) at the system level has become increasingly important in understanding biological processes. Comparison of the interactomes of different species not only provides a better understanding of species evolution but also helps with detecting conserved functional components and in function prediction. Method and Results: Here we report a PPIN alignment method, called PINALOG, which combines information from protein sequence, function and network topology. Alignment of human and yeast PPINs reveals several conserved subnetworks between them that participate in similar biological processes, notably the proteasome and transcription related processes. PINALOG has been tested for its power in protein complex prediction as well as function prediction. Comparison with PSI-BLAST in predicting protein function in the twilight zone also shows that PINALOG is valuable in predicting protein function. Availability and implementation: The PINALOG web-server is freely available from http://www.sbg.bio.ic.ac.uk/~pinalog. The PINALOG program and associated data are available from the Download section of the web-server. Contact:m.sternberg@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hang T T Phan
- Division of Molecular Biosciences, Faculty of Natural Sciences, Imperial College, London, UK
| | | |
Collapse
|
50
|
Di Fruscia P, Ho KK, Laohasinnarong S, Khongkow M, Kroll SHB, Islam SA, Sternberg MJE, Schmidtkunz K, Jung M, Lam EWF, Fuchter MJ. The Discovery of Novel 10,11-Dihydro-5H-dibenz[b,f]azepine SIRT2 Inhibitors. Medchemcomm 2012. [PMID: 24340169 DOI: 10.1039/c2md00290f] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Isoform selective inhibitors of the sirtuins (NAD+-dependent histone deacetylases) should enable an in depth study of the molecular biology underpinning these targets and how they are deregulated in diseases such as cancer and neurodegeneration. Herein, we present the discovery of structurally novel SIRT2 inhibitors. Hit molecule 8 was discovered through the chemical synthesis and biological characterization of a small-molecule compound library based around the 10,11-dihydro-5H-dibenz[b,f]azepine scaffold. In vitro screening assays revealed compound 8 to have an IC50 of 18 μM against SIRT2 and to exhibit more than 30-fold selectivity compared to SIRT1. Cellular assays, performed on MCF-7 cells, confirmed the in vitro selectivity and showed hit 8 to have antiproliferative activity at a concentration of 30 μM. Computational studies were performed to predict the SIRT2 binding mode and to rationalise the observed selectivity.
Collapse
Affiliation(s)
- Paolo Di Fruscia
- Department of Chemistry, Imperial College London, London, United Kingdom
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|