1
|
Abbass J, Parisi C. Machine learning-based prediction of proteins' architecture using sequences of amino acids and structural alphabets. J Biomol Struct Dyn 2024:1-16. [PMID: 38505995 DOI: 10.1080/07391102.2024.2328736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024]
Abstract
In addition to the growth of protein structures generated through wet laboratory experiments and deposited in the PDB repository, AlphaFold predictions have significantly contributed to the creation of a much larger database of protein structures. Annotating such a vast number of structures has become an increasingly challenging task. CATH is widely recognized as one the most common platforms for addressing this challenge, as it classifies proteins based on their structural and evolutionary relationships, offering the scientific community an invaluable resource for uncovering various properties, including functional annotations. While CATH annotation involves - to some extent - human intervention, keeping up with the classification of the rapidly expanding repositories of protein structures has become exceedingly difficult. Therefore, there is a pressing need for a fully automated approach. On the other hand, the abundance of protein sequences stemming from next generation sequencing technologies, lacking structural annotations, presents an additional challenge to the scientific community. Consequently, 'pre-annotating' protein sequences with structural features, ensuring a high level of precision, could prove highly advantageous. In this paper, after a thorough investigation, we introduce a novel machine-learning model capable of classifying any protein domain, whether it has a known structure or not, into one of the 40 main CATH Architectures. We achieve an F1 Score of 0.92 using only the amino acid sequence and a score of 0.94 using both the sequence of amino acids and the sequence of structural alphabets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jad Abbass
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Charles Parisi
- School of Computer Science and Mathematics, Kingston University, London, UK
- Telecom Physique Strasbourg, Strasbourg University, Strasbourg, France
| |
Collapse
|
2
|
Taheri-Ledari M, Zandieh A, Shariatpanahi SP, Eslahchi C. Assignment of structural domains in proteins using diffusion kernels on graphs. BMC Bioinformatics 2022; 23:369. [PMID: 36076174 PMCID: PMC9461149 DOI: 10.1186/s12859-022-04902-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 08/23/2022] [Indexed: 11/10/2022] Open
Abstract
Though proposing algorithmic approaches for protein domain decomposition has been of high interest, the inherent ambiguity to the problem makes it still an active area of research. Besides, accurate automated methods are in high demand as the number of solved structures for complex proteins is on the rise. While majority of the previous efforts for decomposition of 3D structures are centered on the developing clustering algorithms, employing enhanced measures of proximity between the amino acids has remained rather uncharted. If there exists a kernel function that in its reproducing kernel Hilbert space, structural domains of proteins become well separated, then protein structures can be parsed into domains without the need to use a complex clustering algorithm. Inspired by this idea, we developed a protein domain decomposition method based on diffusion kernels on protein graphs. We examined all combinations of four graph node kernels and two clustering algorithms to investigate their capability to decompose protein structures. The proposed method is tested on five of the most commonly used benchmark datasets for protein domain assignment plus a comprehensive non-redundant dataset. The results show a competitive performance of the method utilizing one of the diffusion kernels compared to four of the best automatic methods. Our method is also able to offer alternative partitionings for the same structure which is in line with the subjective definition of protein domain. With a competitive accuracy and balanced performance for the simple and complex structures despite relying on a relatively naive criterion to choose optimal decomposition, the proposed method revealed that diffusion kernels on graphs in particular, and kernel functions in general are promising measures to facilitate parsing proteins into domains and performing different structural analysis on proteins. The size and interconnectedness of the protein graphs make them promising targets for diffusion kernels as measures of affinity between amino acids. The versatility of our method allows the implementation of future kernels with higher performance. The source code of the proposed method is accessible at https://github.com/taherimo/kludo . Also, the proposed method is available as a web application from https://cbph.ir/tools/kludo .
Collapse
Affiliation(s)
- Mohammad Taheri-Ledari
- Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Amirali Zandieh
- Department of Biophysics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Seyed Peyman Shariatpanahi
- Department of Biophysics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
3
|
Torres PHM, Rossi AD, Blundell TL. ProtCHOIR: a tool for proteome-scale generation of homo-oligomers. Brief Bioinform 2021; 22:bbab182. [PMID: 34015821 PMCID: PMC8574958 DOI: 10.1093/bib/bbab182] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/04/2021] [Accepted: 04/20/2021] [Indexed: 01/10/2023] Open
Abstract
The rapid developments in gene sequencing technologies achieved in the recent decades, along with the expansion of knowledge on the three-dimensional structures of proteins, have enabled the construction of proteome-scale databases of protein models such as the Genome3D and ModBase. Nevertheless, although gene products are usually expressed as individual polypeptide chains, most biological processes are associated with either transient or stable oligomerisation. In the PDB databank, for example, ~40% of the deposited structures contain at least one homo-oligomeric interface. Unfortunately, databases of protein models are generally devoid of multimeric structures. To tackle this particular issue, we have developed ProtCHOIR, a tool that is able to generate homo-oligomeric structures in an automated fashion, providing detailed information for the input protein and output complex. ProtCHOIR requires input of either a sequence or a protomeric structure that is queried against a pre-constructed local database of homo-oligomeric structures, then extensively analyzed using well-established tools such as PSI-Blast, MAFFT, PISA and Molprobity. Finally, MODELLER is employed to achieve the construction of the homo-oligomers. The output complex is thoroughly analyzed taking into account its stereochemical quality, interfacial stabilities, hydrophobicity and conservation profile. All these data are then summarized in a user-friendly HTML report that can be saved or printed as a PDF file. The software is easily parallelizable and also outputs a comma-separated file with summary statistics that can straightforwardly be concatenated as a spreadsheet-like document for large-scale data analyses. As a proof-of-concept, we built oligomeric models for the Mabellini Mycobacterium abscessus structural proteome database. ProtCHOIR can be run as a web-service and the code can be obtained free-of-charge at http://lmdm.biof.ufrj.br/protchoir.
Collapse
|
4
|
Méndez-Álvarez D, Herrera-Mayorga V, Juárez-Saldivar A, Paz-González AD, Ortiz-Pérez E, Bandyopadhyay D, Pérez-Sánchez H, Rivera G. Ligand-based virtual screening, molecular docking, and molecular dynamics of eugenol analogs as potential acetylcholinesterase inhibitors with biological activity against Spodoptera frugiperda. Mol Divers 2021; 26:2025-2037. [PMID: 34529209 DOI: 10.1007/s11030-021-10312-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 09/02/2021] [Indexed: 11/26/2022]
Abstract
The development of new, more selective, environmental-friendly insecticide alternatives is in high demand for the control of Spodoptera frugiperda (S. frugiperda). The major objective of this work was to search for new potential S. frugiperda acetylcholinesterase (AChE) inhibitors. A ligand-based virtual screening was initially carried out considering six scaffolds derived from eugenol and the ZINC15, PubChem, and MolPort databases. Subsequently, molecular docking analysis of the selected compounds on the active site and a second region (determined by blind molecular docking) of the AChE of S. frugiperda was performed. Molecular dynamics and Molecular Mechanics Poisson-Boltzmann Surface Area analyses were also applied to improve the docking results. Finally, three new eugenol analogs were evaluated in vitro against S. frugiperda larvae. The virtual screening identified 1609 compounds from the chemical libraries. Control compounds were selected from the interaction fingerprint by molecular docking. Only three new eugenol analogs (1, 3, and 4) were stable at 50 ns by molecular dynamics. Compounds 1 and 4 had the best biological activity by diet (LC50 = 0.042 mg/mL) and by topical route (LC50 = 0.027 mg/mL), respectively. At least three new eugenol derivatives possessed good-to-excellent insecticidal activity against S. frugiperda.
Collapse
Affiliation(s)
- Domingo Méndez-Álvarez
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, 88710, Reynosa, Tamaulipas, México
| | - Verónica Herrera-Mayorga
- Departamento de Ingeniería Bioquímica, Unidad Académica Multidisciplinaria Mante, Universidad Autónoma de Tamaulipas, 89840, Mante, Tamaulipas, México
| | - Alfredo Juárez-Saldivar
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, 88710, Reynosa, Tamaulipas, México
| | - Alma D Paz-González
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, 88710, Reynosa, Tamaulipas, México
| | - Eyra Ortiz-Pérez
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, 88710, Reynosa, Tamaulipas, México
| | - Debasish Bandyopadhyay
- Department of Chemistry and SEEMS, University of Texas Rio Grande Valley, Edinburg, TX, 78539, USA
| | - Horacio Pérez-Sánchez
- Structural Bioinformatics and High-Performance Computing Research Group (BIO-HPC), Computer Engineering Department, Universidad Católica San Antonio De Murcia (UCAM), 30107, Murcia, Spain
| | - Gildardo Rivera
- Laboratorio de Biotecnología Farmacéutica, Centro de Biotecnología Genómica, Instituto Politécnico Nacional, 88710, Reynosa, Tamaulipas, México.
| |
Collapse
|
5
|
Wang Y, Zhang H, Zhong H, Xue Z. Protein domain identification methods and online resources. Comput Struct Biotechnol J 2021; 19:1145-1153. [PMID: 33680357 PMCID: PMC7895673 DOI: 10.1016/j.csbj.2021.01.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 01/03/2023] Open
Abstract
Protein domains are the basic units of proteins that can fold, function, and evolve independently. Knowledge of protein domains is critical for protein classification, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Thus, over the past two decades, a number of protein domain identification approaches have been developed, and a variety of protein domain databases have also been constructed. This review divides protein domain prediction methods into two categories, namely sequence-based and structure-based. These methods are introduced in detail, and their advantages and limitations are compared. Furthermore, this review also provides a comprehensive overview of popular online protein domain sequence and structure databases. Finally, we discuss potential improvements of these prediction methods.
Collapse
Affiliation(s)
- Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical College, Yantai, Shandong 264003, China
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Hang Zhang
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Haolin Zhong
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Zhidong Xue
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
6
|
Sillitoe I, Andreeva A, Blundell TL, Buchan DWA, Finn RD, Gough J, Jones D, Kelley LA, Paysan-Lafosse T, Lam SD, Murzin AG, Pandurangan AP, Salazar GA, Skwark MJ, Sternberg MJE, Velankar S, Orengo C. Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation. Nucleic Acids Res 2020; 48:D314-D319. [PMID: 31733063 PMCID: PMC7139969 DOI: 10.1093/nar/gkz967] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/09/2019] [Accepted: 11/07/2019] [Indexed: 12/20/2022] Open
Abstract
Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
| | - Antonina Andreeva
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge CB2 0QH, UK
| | - Daniel W A Buchan
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, UK
| | - Robert D Finn
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - David Jones
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK
- The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, UK
| | - Lawrence A Kelley
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Typhaine Paysan-Lafosse
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia
| | - Alexey G Murzin
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | | | - Gustavo A Salazar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marcin J Skwark
- Department of Biochemistry, University of Cambridge, Old Addenbrooke’s Site, 80 Tennis Court Road, Cambridge CB2 0QH, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sameer Velankar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
7
|
Waman VP, Blundell TL, Buchan DWA, Gough J, Jones D, Kelley L, Murzin A, Pandurangan AP, Sillitoe I, Sternberg M, Torres P, Orengo C. The Genome3D Consortium for Structural Annotations of Selected Model Organisms. Methods Mol Biol 2020; 2165:27-67. [PMID: 32621218 DOI: 10.1007/978-1-0716-0708-4_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Daniel W A Buchan
- Department of Computer Science, University College London, London, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Cambridge, UK
| | - David Jones
- Department of Computer Science, University College London, London, UK
| | - Lawrence Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | | | | | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Michael Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK
| | - Pedro Torres
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
8
|
Simpkin AJ, Thomas JMH, Simkovic F, Keegan RM, Rigden DJ. Molecular replacement using structure predictions from databases. Acta Crystallogr D Struct Biol 2019; 75:1051-1062. [PMID: 31793899 PMCID: PMC6889911 DOI: 10.1107/s2059798319013962] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 10/12/2019] [Indexed: 01/19/2023] Open
Abstract
Molecular replacement (MR) is the predominant route to solution of the phase problem in macromolecular crystallography. Where the lack of a suitable homologue precludes conventional MR, one option is to predict the target structure using bioinformatics. Such modelling, in the absence of homologous templates, is called ab initio or de novo modelling. Recently, the accuracy of such models has improved significantly as a result of the availability, in many cases, of residue-contact predictions derived from evolutionary covariance analysis. Covariance-assisted ab initio models representing structurally uncharacterized Pfam families are now available on a large scale in databases, potentially representing a valuable and easily accessible supplement to the PDB as a source of search models. Here, the unconventional MR pipeline AMPLE is employed to explore the value of structure predictions in the GREMLIN and PconsFam databases. It was tested whether these deposited predictions, processed in various ways, could solve the structures of PDB entries that were subsequently deposited. The results were encouraging: nine of 27 GREMLIN cases were solved, covering target lengths of 109-355 residues and a resolution range of 1.4-2.9 Å, and with target-model shared sequence identity as low as 20%. The cluster-and-truncate approach in AMPLE proved to be essential for most successes. For the overall lower quality structure predictions in the PconsFam database, remodelling with Rosetta within the AMPLE pipeline proved to be the best approach, generating ensemble search models from single-structure deposits. Finally, it is shown that the AMPLE-obtained search models deriving from GREMLIN deposits are of sufficiently high quality to be selected by the sequence-independent MR pipeline SIMBAD. Overall, the results help to point the way towards the optimal use of the expanding databases of ab initio structure predictions.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Jens M. H. Thomas
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Felix Simkovic
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Ronan M. Keegan
- STFC, Rutherford Appleton Laboratory, Research Complex at Harwell, Didcot OX11 0FA, England
| | - Daniel J. Rigden
- Institute of Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
9
|
Skwark MJ, Torres PHM, Copoiu L, Bannerman B, Floto RA, Blundell TL. Mabellini: a genome-wide database for understanding the structural proteome and evaluating prospective antimicrobial targets of the emerging pathogen Mycobacterium abscessus. Database (Oxford) 2019; 2019:5611286. [PMID: 31681953 PMCID: PMC6853642 DOI: 10.1093/database/baz113] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 07/31/2019] [Accepted: 08/28/2019] [Indexed: 02/02/2023]
Abstract
Mycobacterium abscessus, a rapid growing, multidrug resistant, nontuberculous mycobacteria, can cause a wide range of opportunistic infections, particularly in immunocompromised individuals. M. abscessus has emerged as a growing threat to patients with cystic fibrosis, where it causes accelerated inflammatory lung damage, is difficult and sometimes impossible to treat and can prevent safe transplantation. There is therefore an urgent unmet need to develop new therapeutic strategies. The elucidation of the M. abscessus genome in 2009 opened a wide range of research possibilities in the field of drug discovery that can be more effectively exploited upon the characterization of the structural proteome. Where there are no experimental structures, we have used the available amino acid sequences to create 3D models of the majority of the remaining proteins that constitute the M. abscessus proteome (3394 proteins and over 13 000 models) using a range of up-to-date computational tools, many developed by our own group. The models are freely available for download in an on-line database, together with quality data and functional annotation. Furthermore, we have developed an intuitive and user-friendly web interface (http://www.mabellinidb.science) that enables easy browsing, querying and retrieval of the proteins of interest. We believe that this resource will be of use in evaluating the prospective targets for design of antimicrobial agents and will serve as a cornerstone to support the development of new molecules to treat M. abscessus infections.
Collapse
Affiliation(s)
- Marcin J Skwark
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Pedro H M Torres
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Liviu Copoiu
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Bridget Bannerman
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - R Andres Floto
- Molecular Immunity Unit, Department of Medicine University of Cambridge, MRC-Laboratory of Molecular Biology, Cambridge CB2 0QH, UK
and,Cambridge Centre for Lung Infection, Royal Papworth Hospital, Cambridge CB23 3RE, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: Tel: +44 1223 333628; Fax: +44 1223 766002;
| |
Collapse
|
10
|
Herman LS, Fornace K, Phelan J, Grigg MJ, Anstey NM, William T, Moon RW, Blackman MJ, Drakeley CJ, Tetteh KKA. Identification and validation of a novel panel of Plasmodium knowlesi biomarkers of serological exposure. PLoS Negl Trop Dis 2018; 12:e0006457. [PMID: 29902183 PMCID: PMC6001954 DOI: 10.1371/journal.pntd.0006457] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Accepted: 04/17/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Plasmodium knowlesi is the most common cause of malaria in Malaysian Borneo, with reporting limited to clinical cases presenting to health facilities and scarce data on the true extent of transmission. Serological estimations of transmission have been used with other malaria species to garner information about epidemiological patterns. However, there are a distinct lack of suitable serosurveillance tools for this neglected disease. METHODOLOGY/PRINCIPAL FINDINGS Using in silico tools, we designed and expressed four novel P. knowlesi protein products to address the distinct lack of suitable serosurveillance tools: PkSERA3 antigens 1 and 2, PkSSP2/TRAP and PkTSERA2 antigen 1. Antibody prevalence to these antigens was determined by ELISA for three time-points post-treatment from a hospital-based clinical treatment trial in Sabah, East Malaysia (n = 97 individuals; 241 total samples for all time points). Higher responses were observed for the PkSERA3 antigen 2 (67%, 65/97) across all time-points (day 0: 36.9% 34/92; day 7: 63.8% 46/72; day 28: 58.4% 45/77) with significant differences between the clinical cases and controls (n = 55, mean plus 3 SD) (day 0 p<0.0001; day 7 p<0.0001; day 28 p<0.0001). Using boosted regression trees, we developed models to classify P. knowlesi exposure (cross-validated AUC 88.9%; IQR 86.1-91.3%) and identified the most predictive antibody responses. CONCLUSIONS/SIGNIFICANCE The PkSERA3 antigen 2 had the highest relative variable importance in all models. Further validation of these antigens is underway to determine the specificity of these tools in the context of multi-species infections at the population level.
Collapse
Affiliation(s)
- Lou S. Herman
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Kimberly Fornace
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Jody Phelan
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Matthew J. Grigg
- Menzies School of Health Research and Charles Darwin University, Darwin, Northern Territory, Australia
- Infectious Diseases Society Sabah-Menzies School of Health Research Clinical Research Unit, Kota Kinabalu, Sabah, Malaysia
| | - Nicholas M. Anstey
- Menzies School of Health Research and Charles Darwin University, Darwin, Northern Territory, Australia
- Infectious Diseases Society Sabah-Menzies School of Health Research Clinical Research Unit, Kota Kinabalu, Sabah, Malaysia
| | - Timothy William
- Infectious Diseases Society Sabah-Menzies School of Health Research Clinical Research Unit, Kota Kinabalu, Sabah, Malaysia
- Clinical Research Centre, Queen Elizabeth Hospital, Kota Kinabalu, Sabah, Malaysia
- Jesselton Medical Centre, Kota Kinabalu, Sabah, Malaysia
| | - Robert W. Moon
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Michael J. Blackman
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
- Malaria Biochemistry Laboratory, The Francis Crick Institute, London, United Kingdom
| | - Chris J. Drakeley
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Kevin K. A. Tetteh
- Department Immunology and Infection, London School of Hygiene and Tropical Medicine, London, United Kingdom
| |
Collapse
|
11
|
Houston S, Lithgow KV, Osbak KK, Kenyon CR, Cameron CE. Functional insights from proteome-wide structural modeling of Treponema pallidum subspecies pallidum, the causative agent of syphilis. BMC STRUCTURAL BIOLOGY 2018; 18:7. [PMID: 29769048 PMCID: PMC5956850 DOI: 10.1186/s12900-018-0086-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 04/27/2018] [Indexed: 12/21/2022]
Abstract
Background Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Results Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Conclusions Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome. Electronic supplementary material The online version of this article (10.1186/s12900-018-0086-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Simon Houston
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
| | - Karen Vivien Lithgow
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
| | | | - Chris Richard Kenyon
- HIV/STI Unit, Institute of Tropical Medicine, Antwerp, Belgium.,Division of Infectious Diseases and HIV Medicine, University of Cape Town, Cape Town, South Africa
| | - Caroline E Cameron
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada.
| |
Collapse
|
12
|
Shimizu K, Cao W, Saad G, Shoji M, Terada T. Comparative analysis of membrane protein structure databases. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2018; 1860:1077-1091. [PMID: 29331638 DOI: 10.1016/j.bbamem.2018.01.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2017] [Revised: 12/28/2017] [Accepted: 01/04/2018] [Indexed: 12/11/2022]
Abstract
BACKGROUND Membrane proteins play important roles in cell survival and cell communication, as they function as transporters, receptors, anchors and enzymes. They are also potential targets for drugs that block receptors or inhibit enzymes related to diseases. Although the number of known structures of membrane proteins is still small relative to the size of the proteome as a whole, many new membrane protein structures have been determined recently. SCOPE OF THE ARTICLE We compared and analyzed the widely used membrane protein databases, mpstruc, Orientations of Proteins in Membranes (OPM), and PDBTM, as well as the extended dataset of mpstruc based on sequence similarity, the PDB structures whose classification field indicates that they are "membrane proteins" and the proteins with Structural Classification of Proteins (SCOP) class-f domains. We evaluated the relationships between these databases or datasets based on the overlap in their contents and the degree of consistency in the structural, topological, and functional classifications and in the transmembrane domain assignment. MAJOR CONCLUSIONS The membrane databases differ from each other in their coverage, and in the criteria that they use for annotation and classification. To ensure the efficient use of these databases, it is important to understand their differences and similarities. The establishment of more detailed and consistent annotations for the sequence, structure, membrane association, and function of membrane proteins is still required. GENERAL SIGNIFICANCE Considering the recent growth of experimentally determined structures, a broad survey and cumulative analysis of the sum of knowledge as presented in the membrane protein structure databases can be helpful to elucidate structures and functions of membrane proteins. We also aim to provide a framework for future research and classification of membrane proteins.
Collapse
Affiliation(s)
- Kentaro Shimizu
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| | - Wei Cao
- Faculty of Information Networking for Innovation and Design, Toyo University, Tokyo, Japan.
| | - Gull Saad
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| | - Michiru Shoji
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| | - Tohru Terada
- Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| |
Collapse
|
13
|
Han Z, Wei G. Computational tools for Hi-C data analysis. QUANTITATIVE BIOLOGY 2017. [DOI: 10.1007/s40484-017-0113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Lessons from making the Structural Classification of Proteins (SCOP) and their implications for protein structure modelling. Biochem Soc Trans 2017; 44:937-43. [PMID: 27284063 PMCID: PMC5011417 DOI: 10.1042/bst20160053] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Indexed: 12/04/2022]
Abstract
The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling.
Collapse
|
15
|
Minde D, Dunker AK, Lilley KS. Time, space, and disorder in the expanding proteome universe. Proteomics 2017; 17:1600399. [PMID: 28145059 PMCID: PMC5573936 DOI: 10.1002/pmic.201600399] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2016] [Revised: 01/16/2017] [Accepted: 01/25/2017] [Indexed: 12/31/2022]
Abstract
Proteins are highly dynamic entities. Their myriad functions require specific structures, but proteins' dynamic nature ranges all the way from the local mobility of their amino acid constituents to mobility within and well beyond single cells. A truly comprehensive view of the dynamic structural proteome includes: (i) alternative sequences, (ii) alternative conformations, (iii) alternative interactions with a range of biomolecules, (iv) cellular localizations, (v) alternative behaviors in different cell types. While these aspects have traditionally been explored one protein at a time, we highlight recently emerging global approaches that accelerate comprehensive insights into these facets of the dynamic nature of protein structure. Computational tools that integrate and expand on multiple orthogonal data types promise to enable the transition from a disjointed list of static snapshots to a structurally explicit understanding of the dynamics of cellular mechanisms.
Collapse
Affiliation(s)
- David‐Paul Minde
- Cambridge Systems Biology CentreUniversity of CambridgeCambridgeUK
- Cambridge Centre for ProteomicsDepartment of BiochemistryUniversity of CambridgeCambridgeUK
- Department of BiochemistryUniversity of CambridgeCambridgeUK
| | - A. Keith Dunker
- Center for Computational Biology and BioinformaticsIndiana University School of MedicineIndianapolisINUSA
| | - Kathryn S. Lilley
- Cambridge Systems Biology CentreUniversity of CambridgeCambridgeUK
- Cambridge Centre for ProteomicsDepartment of BiochemistryUniversity of CambridgeCambridgeUK
- Department of BiochemistryUniversity of CambridgeCambridgeUK
| |
Collapse
|
16
|
Müller I. Guidelines for the successful generation of protein-ligand complex crystals. Acta Crystallogr D Struct Biol 2017; 73:79-92. [PMID: 28177304 PMCID: PMC5297911 DOI: 10.1107/s2059798316020271] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Accepted: 12/21/2016] [Indexed: 11/23/2022] Open
Abstract
With continuous technical improvements at synchrotron facilities, data-collection rates have increased dramatically. This makes it possible to collect diffraction data for hundreds of protein-ligand complexes within a day, provided that a suitable crystal system is at hand. However, developing a suitable crystal system can prove challenging, exceeding the timescale of data collection by several orders of magnitude. Firstly, a useful crystallization construct of the protein of interest needs to be chosen and its expression and purification optimized, before screening for suitable crystallization and soaking conditions can start. This article reviews recent publications analysing large data sets of crystallization trials, with the aim of identifying factors that do or do not make a good crystallization construct, and gives guidance in the design of an expression construct. It provides an overview of common protein-expression systems, addresses how ligand binding can be both help and hindrance for protein purification, and describes ligand co-crystallization and soaking, with an emphasis on troubleshooting.
Collapse
Affiliation(s)
- Ilka Müller
- Structural Biology, Discovery from Charles River, Chesterford Research Park, Saffron Walden CB10 1XL, England
| |
Collapse
|
17
|
Chandonia JM, Fox NK, Brenner SE. SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database. J Mol Biol 2016; 429:348-355. [PMID: 27914894 DOI: 10.1016/j.jmb.2016.11.023] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 11/23/2016] [Accepted: 11/24/2016] [Indexed: 12/23/2022]
Abstract
SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | - Naomi K Fox
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Steven E Brenner
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
18
|
The history of the CATH structural classification of protein domains. Biochimie 2015; 119:209-17. [PMID: 26253692 PMCID: PMC4678953 DOI: 10.1016/j.biochi.2015.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 11/21/2022]
Abstract
This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. We present a historical review of the protein structure database CATH. We review the expansion of the CATH and SCOP resources with sequence data and functional annotations. How functional annotation resources allow insights into functional divergence and evolution within protein families.
Collapse
|
19
|
Yachdav G, Goldberg T, Wilzbach S, Dao D, Shih I, Choudhary S, Crouch S, Franz M, García A, García LJ, Grüning BA, Inupakutika D, Sillitoe I, Thanki AS, Vieira B, Villaveces JM, Schneider MV, Lewis S, Pettifer S, Rost B, Corpas M. Anatomy of BioJS, an open source community for the life sciences. eLife 2015; 4. [PMID: 26153621 PMCID: PMC4495654 DOI: 10.7554/elife.07009] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 06/20/2015] [Indexed: 01/12/2023] Open
Abstract
BioJS is an open source software project that develops visualization tools for different types of biological data. Here we report on the factors that influenced the growth of the BioJS user and developer community, and outline our strategy for building on this growth. The lessons we have learned on BioJS may also be relevant to other open source software projects. DOI:http://dx.doi.org/10.7554/eLife.07009.001
Collapse
Affiliation(s)
- Guy Yachdav
- Bioinformatik, Biosof LLC, Garching, Germany
| | - Tatyana Goldberg
- Bioinformatik, Technische Universität München, Garching, Germany
| | | | - David Dao
- Bioinformatik, Technische Universität München, Garching, Germany
| | - Iris Shih
- Bioinformatik, Technische Universität München, Garching, Germany
| | - Saket Choudhary
- Molecular and Computational Biology, University of Southern California, Los Angeles, United States
| | - Steve Crouch
- Web and Internet Science, University of Southampton, Southampton, United Kingdom
| | - Max Franz
- Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | | | - Leyla J García
- European Molecular Biology Laboratory-European Bioinformatics Institute, Cambridge, United Kingdom
| | - Björn A Grüning
- Bioinformatics Group, Department of Computer Science and Centre for Biological Systems Analysis, University of Freiburg, Freiburg, Germany
| | - Devasena Inupakutika
- Web and Internet Science, University of Southampton, Southampton, United Kingdom
| | - Ian Sillitoe
- Institute of Structure and Molecular Biology, University College London, London, United Kingdom
| | | | - Bruno Vieira
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | | | | | - Suzanna Lewis
- Lawrence Berkeley National Laboratory, Berkeley, United States
| | - Steve Pettifer
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | | | | |
Collapse
|
20
|
Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions. Bioinformatics 2015; 31:3429-36. [DOI: 10.1093/bioinformatics/btv345] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/29/2015] [Indexed: 11/13/2022] Open
|
21
|
Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 2015. [PMID: 25950237 DOI: 10.1038/nprot.2015-053] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.
Collapse
Affiliation(s)
- Lawrence A Kelley
- Structural Bioinformatics Group, Imperial College London, London, UK
| | - Stefans Mezulis
- Structural Bioinformatics Group, Imperial College London, London, UK
| | | | - Mark N Wass
- Structural Bioinformatics Group, Imperial College London, London, UK
| | | |
Collapse
|
22
|
Abstract
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.
Collapse
|
23
|
Ochoa-Montaño B, Mohan N, Blundell TL. CHOPIN: a web resource for the structural and functional proteome of Mycobacterium tuberculosis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav026. [PMID: 25833954 PMCID: PMC4381106 DOI: 10.1093/database/bav026] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 03/01/2015] [Indexed: 11/18/2022]
Abstract
Tuberculosis kills more than a million people annually and presents increasingly high levels of resistance against current first line drugs. Structural information about Mycobacterium tuberculosis (Mtb) proteins is a valuable asset for the development of novel drugs and for understanding the biology of the bacterium; however, only about 10% of the ∼4000 proteins have had their structures determined experimentally. The CHOPIN database assigns structural domains and generates homology models for 2911 sequences, corresponding to ∼73% of the proteome. A sophisticated pipeline allows multiple models to be created using conformational states characteristic of different oligomeric states and ligand binding, such that the models reflect various functional states of the proteins. Additionally, CHOPIN includes structural analyses of mutations potentially associated with drug resistance. Results are made available at the web interface, which also serves as an automatically updated repository of all published Mtb experimental structures. Its RESTful interface allows direct and flexible access to structures and metadata via intuitive URLs, enabling easy programmatic use of the models. Database URL: http://structure.bioc.cam.ac.uk/chopin
Collapse
Affiliation(s)
- Bernardo Ochoa-Montaño
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Nishita Mohan
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA, UK and Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| |
Collapse
|
24
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 251] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
25
|
Wang R, Perez-Riverol Y, Hermjakob H, Vizcaíno JA. Open source libraries and frameworks for biological data visualisation: a guide for developers. Proteomics 2015; 15:1356-74. [PMID: 25475079 PMCID: PMC4409855 DOI: 10.1002/pmic.201400377] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 10/21/2014] [Accepted: 11/26/2014] [Indexed: 12/21/2022]
Abstract
Recent advances in high-throughput experimental techniques have led to an exponential increase in both the size and the complexity of the data sets commonly studied in biology. Data visualisation is increasingly used as the key to unlock this data, going from hypothesis generation to model evaluation and tool implementation. It is becoming more and more the heart of bioinformatics workflows, enabling scientists to reason and communicate more effectively. In parallel, there has been a corresponding trend towards the development of related software, which has triggered the maturation of different visualisation libraries and frameworks. For bioinformaticians, scientific programmers and software developers, the main challenge is to pick out the most fitting one(s) to create clear, meaningful and integrated data visualisation for their particular use cases. In this review, we introduce a collection of open source or free to use libraries and frameworks for creating data visualisation, covering the generation of a wide variety of charts and graphs. We will focus on software written in Java, JavaScript or Python. We truly believe this software offers the potential to turn tedious data into exciting visual stories.
Collapse
Affiliation(s)
- Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | |
Collapse
|
26
|
Abstract
A key reason three-dimensional (3-D) protein structures are annotated with supporting or derived information is to understand the molecular basis of protein function. To this end, protein structure annotation databases curate key facts and observations, based on community-accepted standards, about the ~100,000 3-D experimental protein structures to date. This review will introduce the primary structure repositories, databases, and value-added structural annotation databases, as well as the range of information they provide. The different levels of annotation data (primary vs. derived vs. inferred) and how they should all be considered accordingly will also be described.
Collapse
Affiliation(s)
- Margaret J. Gabanyi
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Helen M. Berman
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| |
Collapse
|
27
|
Hua Y, Zhu M, Wang Y, Xie Z, Li M. A hybrid method for identification of structural domains. Sci Rep 2014; 4:7476. [PMID: 25503992 PMCID: PMC4265785 DOI: 10.1038/srep07476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 11/25/2014] [Indexed: 11/10/2022] Open
Abstract
Structural domains in proteins are the basic units to form various proteins. In the protein's evolution and functioning, domains play important roles. But the definition of domain is not yet precisely given, and the update cycle of structural domain databases is long. The automatic algorithms identify domains slowly, while protein entities with great structural complexity are on the rise. Here, we present a method which recognizes the compact and modular segments of polypeptide chains to identify structural domains, and contrast some data sets to illuminate their effect. The method combines support vector machine (SVM) with K-means algorithm. It is faster and more stable than most current algorithms and performs better. It also indicates that when proteins are presented as some Alpha-carbon atoms in 3D space, it is feasible to identify structural domains by the spatially structural properties. We have developed a web-server, which would be helpful in identification of structural domains (http://vis.sculab.org/~huayongpan/cgi-bin/domainAssignment.cgi).
Collapse
Affiliation(s)
- Yongpan Hua
- College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, 610064 Chengdu, China
| | - Min Zhu
- College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, 610064 Chengdu, China
| | - Yuelong Wang
- College of Chemistry, Sichuan University, No.24 South Section 1, Yihuan Road, 610065 Chengdu, China
| | - Zhaoyang Xie
- College of Computer Science, Sichuan University, No.24 South Section 1, Yihuan Road, 610064 Chengdu, China
| | - Menglong Li
- College of Chemistry, Sichuan University, No.24 South Section 1, Yihuan Road, 610065 Chengdu, China
| |
Collapse
|
28
|
Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong SY, Bateman A, Punta M, Attwood TK, Sigrist CJA, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, Finn RD. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res 2014; 43:D213-21. [PMID: 25428371 PMCID: PMC4383996 DOI: 10.1093/nar/gku1243] [Citation(s) in RCA: 941] [Impact Index Per Article: 94.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.
Collapse
Affiliation(s)
- Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Louise Daugherty
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Matthew Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sarah Hunter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Craig McAnulla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Conor McMenamin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Maxim Scheremetjew
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Claudia Rato
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Siew-Yit Yong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Marco Punta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Teresa K Attwood
- Faculty of Life Science and School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Christian J A Sigrist
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland
| | - Nicole Redaschi
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland
| | - Catherine Rivoire
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland
| | - Ioannis Xenarios
- Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland Department of Biochemistry, University of Geneva, 1211 Geneva, Switzerland
| | - Daniel Kahn
- Pôle Rhône-Alpin de Bio-Informatique (PRABI), Batiment G. Mendel, Universite Claude Bernard, 43 bd du 11 novembre 1918, 69622 Villeurbanne Cedex, France
| | - Dominique Guyot
- Pôle Rhône-Alpin de Bio-Informatique (PRABI), Batiment G. Mendel, Universite Claude Bernard, 43 bd du 11 novembre 1918, 69622 Villeurbanne Cedex, France
| | - Peer Bork
- European Molecular Laboratory (EMBL), Meyerhofstasse 1, 69117 Heidelberg, Germany
| | - Ivica Letunic
- European Molecular Laboratory (EMBL), Meyerhofstasse 1, 69117 Heidelberg, Germany
| | - Julian Gough
- Department of Computer Science, University of Bristol, Woodland Road, Bristol, BS8 1UB, UK
| | - Matt Oates
- Department of Computer Science, University of Bristol, Woodland Road, Bristol, BS8 1UB, UK
| | - Daniel Haft
- J. Craig Venter Institute (JCVI), 9704 Medical Center Drive, Rockville, MD 20850, USA
| | - Hongzhan Huang
- Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA
| | - Darren A Natale
- Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA
| | - Cathy H Wu
- Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
| | - Christine Orengo
- Structural and Molecular Biology Department, University College London, University of London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Structural and Molecular Biology Department, University College London, University of London, London, WC1E 6BT, UK
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
29
|
Oates ME, Stahlhacke J, Vavoulis DV, Smithers B, Rackham OJL, Sardar AJ, Zaucha J, Thurlby N, Fang H, Gough J. The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res 2014; 43:D227-33. [PMID: 25414345 PMCID: PMC4383889 DOI: 10.1093/nar/gku1041] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We present updates to the SUPERFAMILY 1.75 (http://supfam.org) online resource and protein sequence collection. The hidden Markov model library that provides sequence homology to SCOP structural domains remains unchanged at version 1.75. In the last 4 years SUPERFAMILY has more than doubled its holding of curated complete proteomes over all cellular life, from 1400 proteomes reported previously in 2010 up to 3258 at present. Outside of the main sequence collection, SUPERFAMILY continues to provide domain annotation for sequences provided by other resources such as: UniProt, Ensembl, PDB, much of JGI Phytozome and selected subcollections of NCBI RefSeq. Despite this growth in data volume, SUPERFAMILY now provides users with an expanded and daily updated phylogenetic tree of life (sTOL). This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library. Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community. We have now introduced these data in an integrated manner online at the level of an individual sequence, and—in the case of whole genomes—with enrichment analysis against a taxonomically defined background.
Collapse
Affiliation(s)
- Matt E Oates
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | | | | | - Ben Smithers
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Owen J L Rackham
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Medical Research Council Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital, London, UK
| | - Adam J Sardar
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK e-Therapeutics plc,17 Blenheim Office Park, Long Hanborough, Oxfordshire, OX29 8LN, UK
| | - Jan Zaucha
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Bristol Centre for Complexity Sciences, University of Bristol, Bristol, UK
| | - Natalie Thurlby
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK Bristol Centre for Complexity Sciences, University of Bristol, Bristol, UK
| | - Hai Fang
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Julian Gough
- Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| |
Collapse
|
30
|
Lewis TE, Sillitoe I, Andreeva A, Blundell TL, Buchan DWA, Chothia C, Cozzetto D, Dana JM, Filippis I, Gough J, Jones DT, Kelley LA, Kleywegt GJ, Minneci F, Mistry J, Murzin AG, Ochoa-Montaño B, Oates ME, Punta M, Rackham OJL, Stahlhacke J, Sternberg MJE, Velankar S, Orengo C. Genome3D: exploiting structure to help users understand their sequences. Nucleic Acids Res 2014; 43:D382-6. [PMID: 25348407 PMCID: PMC4384030 DOI: 10.1093/nar/gku973] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3D's SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.
Collapse
Affiliation(s)
- Tony E Lewis
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Antonina Andreeva
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Daniel W A Buchan
- Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - Cyrus Chothia
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK
| | - Domenico Cozzetto
- Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - José M Dana
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Ioannis Filippis
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Julian Gough
- Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - David T Jones
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - Lawrence A Kelley
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Gerard J Kleywegt
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Federico Minneci
- Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK
| | - Jaina Mistry
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Alexey G Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 0QH, UK
| | - Bernardo Ochoa-Montaño
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Matt E Oates
- Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Marco Punta
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Owen J L Rackham
- MRC Clinical Sciences Centre, Hammersmith Hospital Campus, Du Cane Road, London, W12 0NN, UK
| | - Jonathan Stahlhacke
- Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sameer Velankar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
31
|
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones D, Kim PM, Kriwacki R, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright P, Babu MM. Classification of intrinsically disordered regions and proteins. Chem Rev 2014; 114:6589-631. [PMID: 24773235 PMCID: PMC4095912 DOI: 10.1021/cr400525m] [Citation(s) in RCA: 1410] [Impact Index Per Article: 141.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Indexed: 12/11/2022]
Affiliation(s)
- Robin van der Lee
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
- Centre
for Molecular and Biomolecular Informatics, Radboud University Medical Centre, 6500 HB Nijmegen, The
Netherlands
| | - Marija Buljan
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Benjamin Lang
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Gary W. Daughdrill
- Department
of Cell Biology, Microbiology, and Molecular Biology, University of South Florida, 3720 Spectrum Boulevard, Suite 321, Tampa, Florida 33612, United States
| | - A. Keith Dunker
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Monika Fuxreiter
- MTA-DE
Momentum Laboratory of Protein Dynamics, Department of Biochemistry
and Molecular Biology, University of Debrecen, H-4032 Debrecen, Nagyerdei krt 98, Hungary
| | - Julian Gough
- Department
of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, United Kingdom
| | - Joerg Gsponer
- Department
of Biochemistry and Molecular Biology, Centre for High-Throughput
Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - David
T. Jones
- Bioinformatics
Group, Department of Computer Science, University
College London, London, WC1E 6BT, United Kingdom
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular
Genetics, and Department of Computer Science, University
of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Richard
W. Kriwacki
- Department
of Structural Biology, St. Jude Children’s
Research Hospital, Memphis, Tennessee 38105, United States
| | - Christopher J. Oldfield
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Rohit V. Pappu
- Department
of Biomedical Engineering and Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Peter Tompa
- VIB Department
of Structural Biology, Vrije Universiteit
Brussel, Brussels, Belgium
- Institute
of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Vladimir N. Uversky
- Department
of Molecular Medicine and USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, United States
- Institute for Biological Instrumentation,
Russian Academy of Sciences, Pushchino,
Moscow Region, Russia
| | - Peter
E. Wright
- Department
of Integrative Structural and Computational Biology and Skaggs Institute
of Chemical Biology, The Scripps Research
Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, United States
| | - M. Madan Babu
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
32
|
Lu HC, Fornili A, Fraternali F. Protein-protein interaction networks studies and importance of 3D structure knowledge. Expert Rev Proteomics 2014; 10:511-20. [PMID: 24206225 DOI: 10.1586/14789450.2013.856764] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Protein-protein interaction networks (PPINs) are a powerful tool to study biological processes in living cells. In this review, we present the progress of PPIN studies from abstract to more detailed representations. We will focus on 3D interactome networks, which offer detailed information at the atomic level. This information can be exploited in understanding not only the underlying cellular mechanisms, but also how human variants and disease-causing mutations affect protein functions and complexes' stability. Recent studies have used structural information on PPINs to also understand the molecular mechanisms of binding partner selection. We will address the challenges in generating 3D PPINs due to the restricted number of solved protein structures. Finally, some of the current use of 3D PPINs will be discussed, highlighting their contribution to the studies in genotype-phenotype relationships and in the optimization of targeted studies to design novel chemical compounds for medical treatments.
Collapse
Affiliation(s)
- Hui-Chun Lu
- Randall Division of Cell and Molecular Biophysics, King's College London, New Hunt's House, London SE1 1UL, UK
| | | | | |
Collapse
|
33
|
Yates CM, Filippis I, Kelley LA, Sternberg MJE. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J Mol Biol 2014; 426:2692-701. [PMID: 24810707 PMCID: PMC4087249 DOI: 10.1016/j.jmb.2014.04.026] [Citation(s) in RCA: 162] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Revised: 04/23/2014] [Accepted: 04/28/2014] [Indexed: 11/16/2022]
Abstract
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. Bioinformatics approaches are key for identification of disease-causing variants. SAV phenotype prediction can be improved using network information. A method including these features, SuSPect, outperforms tested methods. SuSPect is available to use at www.sbg.bio.ic.ac.uk/suspect.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK.
| | - Ioannis Filippis
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
34
|
Corpas M, Jimenez R, Carbon SJ, García A, Garcia L, Goldberg T, Gomez J, Kalderimis A, Lewis SE, Mulvany I, Pawlik A, Rowland F, Salazar G, Schreiber F, Sillitoe I, Spooner WH, Thanki AS, Villaveces JM, Yachdav G, Hermjakob H. BioJS: an open source standard for biological visualisation - its status in 2014. F1000Res 2014; 3:55. [PMID: 25075290 PMCID: PMC4103492 DOI: 10.12688/f1000research.3-55.v1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/13/2014] [Indexed: 11/20/2022] Open
Abstract
BioJS is a community-based standard and repository of functional components to represent biological information on the web. The development of BioJS has been prompted by the growing need for bioinformatics visualisation tools to be easily shared, reused and discovered. Its modular architecture makes it easy for users to find a specific functionality without needing to know how it has been built, while components can be extended or created for implementing new functionality. The BioJS community of developers currently provides a range of functionality that is open access and freely available. A registry has been set up that categorises and provides installation instructions and testing facilities at http://www.ebi.ac.uk/tools/biojs/. The source code for all components is available for ready use at https://github.com/biojs/biojs.
Collapse
Affiliation(s)
- Manuel Corpas
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Rafael Jimenez
- European Bioinformatics Institute EMBL-EBI, Hinxton, CB10 1SD, UK
| | - Seth J Carbon
- Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Alex García
- School of Library and Information Science, Florida State University, Tallahassee, FL, USA
| | - Leyla Garcia
- European Bioinformatics Institute EMBL-EBI, Hinxton, CB10 1SD, UK
| | - Tatyana Goldberg
- TUM, Department of Informatics, Bioinformatics & Computational Biology, 5748 Garching/ Munich, Germany
| | - John Gomez
- European Bioinformatics Institute EMBL-EBI, Hinxton, CB10 1SD, UK
| | - Alexis Kalderimis
- Department of Genetics and Cambridge Systems Biology Centre, Cambridge University, Cambridge, CB2 3EH, UK
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Aleksandra Pawlik
- Faculty of Mathematics, Computing and Technology, Open University, UK, Milton Keynes, MK7 6AA, UK
| | - Francis Rowland
- European Bioinformatics Institute EMBL-EBI, Hinxton, CB10 1SD, UK
| | - Gustavo Salazar
- Computational Biology Group, University of Cape Town, Cape Town, South Africa
| | - Fabian Schreiber
- European Bioinformatics Institute EMBL-EBI, Hinxton, CB10 1SD, UK ; The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Ian Sillitoe
- Biomolecular Structure and Modelling Group Department of Biochemistry, University College London, London, UK
| | | | - Anil S Thanki
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - José M Villaveces
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Germany
| | - Guy Yachdav
- TUM, Department of Informatics, Bioinformatics & Computational Biology, 5748 Garching/ Munich, Germany ; TUM Graduate School of Information Science in Health (GSISH), 85748 Garching/Munich, Germany ; Biosof LLC, New York, NY, 10001, USA
| | | |
Collapse
|
35
|
Shi R, McDonald L, Cygler M, Ekiel I. Coiled-coil helix rotation selects repressing or activating state of transcriptional regulator DhaR. Structure 2014; 22:478-87. [PMID: 24440518 DOI: 10.1016/j.str.2013.11.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Revised: 11/15/2013] [Accepted: 11/19/2013] [Indexed: 10/25/2022]
Abstract
Escherichia coli dihydroxyacetone (Dha) kinase consists of two subunits, DhaK and DhaL. Transcription of dha operon is regulated by the DhaR transcription factor and its action is under control of the kinase subunits. DhaR is activated by interaction with DhaL while it is repressed by DhaK. We have determined the structures of DhaK and DhaL bound to the tandem GAF-like and PAS domains of the DhaR, providing an architectural model for how GAF/PAS tandem domains work together in binding protein partners. The structures reveal a mechanism of opposite transcriptional regulation by the DhaK and DhaL subunits. The kinase subunits interface with DhaR through surfaces that partially overlap with their active sites, allowing sensing of ATP- versus ADP-loaded DhaL subunit and also precluding a ternary complex between DhaK-DhaL and DhaR. The rotation of helices within the DhaR coiled-coil linker upon DhaL binding provides the mechanism for transmitting the binding signal from the GAF/PAS domains to the C-terminal DNA-binding domain.
Collapse
Affiliation(s)
- Rong Shi
- Département de Biochimie, de Microbiologie et de Bio-informatique, PROTEO, Université Laval, Pavillon Charles-Eugene-Marchand, Québec City, QC G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Pavillon Charles-Eugene-Marchand, Québec City, QC G1V 0A6, Canada
| | - Laura McDonald
- Department of Chemistry and Biochemistry, Concordia University, 7141 Sherbrooke Street West, Montreal, QC H4B 1R6, Canada
| | - Miroslaw Cygler
- Department of Biochemistry, University of Saskatchewan, 107 Wiggins Road, Saskatoon, SK S7N 5E5, Canada; Department of Biochemistry, McGill University, 3655 Promenade Sir William Osler, Montreal, QC H3G 1Y6, Canada.
| | - Irena Ekiel
- Department of Chemistry and Biochemistry, Concordia University, 7141 Sherbrooke Street West, Montreal, QC H4B 1R6, Canada; National Research Council of Canada, Life Sciences, 6100 Royalmount Avenue, Montreal, QC H4P 2R2, Canada.
| |
Collapse
|
36
|
Yamada K, Tomii K. Revisiting amino acid substitution matrices for identifying distantly related proteins. ACTA ACUST UNITED AC 2013; 30:317-25. [PMID: 24281694 PMCID: PMC3904525 DOI: 10.1093/bioinformatics/btt694] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Motivation: Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever. Results: Using principal component analysis with existing matrices and benchmarks, we developed a novel matrix, which we designate as MIQS. The detection performance of MIQS is validated and compared with that of existing general purpose matrices using SSEARCH with optimized gap penalties for each matrix. Results show that MIQS is able to detect more remote homology than the existing matrices on an independent dataset. In addition, the performance of our developed matrix was superior to that of CS-BLAST, which was a novel similarity search method with no amino acid matrix. We also evaluated the alignment quality of matrices and methods, which revealed that MIQS shows higher alignment sensitivity than that with the existing matrix series and CS-BLAST. Fundamentally, these results are expected to constitute good proof of the availability and/or importance of amino acid matrices in sequence analysis. Moreover, with our developed matrix, sophisticated similarity search methods such as sequence–profile and profile–profile comparison methods can be improved further. Availability and implementation: Newly developed matrices and datasets used for this study are available at http://csas.cbrc.jp/Ssearch/. Contact:k-tomii@aist.go.jp Supplementary information:Supplementary data are available at Bioinformatics online
Collapse
Affiliation(s)
- Kazunori Yamada
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | | |
Collapse
|
37
|
Lees JG, Lee D, Studer RA, Dawson NL, Sillitoe I, Das S, Yeats C, Dessailly BH, Rentzsch R, Orengo CA. Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Res 2013; 42:D240-5. [PMID: 24270792 PMCID: PMC3965083 DOI: 10.1093/nar/gkt1205] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.
Collapse
Affiliation(s)
- Jonathan G Lees
- Division of Biosciences, Institute of Structural and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK, Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus, Norfolk Place, London W2 1PG, UK and Robert Koch Institut, Research Group Bioinformatics Ng4, Nordufer 20, 13353 Berlin, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Chakraborty S, Venkatramani R, Rao BJ, Asgeirsson B, Dandekar AM. The electrostatic profile of consecutive Cβ atoms applied to protein structure quality assessment. F1000Res 2013; 2:243. [PMID: 25506420 PMCID: PMC4257144 DOI: 10.12688/f1000research.2-243.v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/16/2014] [Indexed: 02/10/2024] Open
Abstract
The structure of a protein provides insight into its physiological interactions with other components of the cellular soup. Methods that predict putative structures from sequences typically yield multiple, closely-ranked possibilities. A critical component in the process is the model quality assessing program (MQAP), which selects the best candidate from this pool of structures. Here, we present a novel MQAP based on the physical properties of sidechain atoms. We propose a method for assessing the quality of protein structures based on the electrostatic potential difference (EPD) of Cβ atoms in consecutive residues. We demonstrate that the EPDs of Cβ atoms on consecutive residues provide unique signatures of the amino acid types. The EPD of Cβ atoms are learnt from a set of 1000 non-homologous protein structures with a resolution cuto of 1.6 Å obtained from the PISCES database. Based on the Boltzmann hypothesis that lower energy conformations are proportionately sampled more, and on Annsen's thermodynamic hypothesis that the native structure of a protein is the minimum free energy state, we hypothesize that the deviation of observed EPD values from the mean values obtained in the learning phase is minimized in the native structure. We achieved an average specificity of 0.91, 0.94 and 0.93 on hg_structal, 4state_reduced and ig_structal decoy sets, respectively, taken from the Decoys `R' Us database. The source code and manual is made available at https://github.com/sanchak/mqap and permanently available on 10.5281/zenodo.7134.
Collapse
Affiliation(s)
- Sandeep Chakraborty
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, 400 005, India
| | - Ravindra Venkatramani
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, 400 005, India
| | - Basuthkar J. Rao
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, 400 005, India
| | - Bjarni Asgeirsson
- Science Institute, Department of Biochemistry, University of Iceland, IS-107 Reykjavik, Iceland
| | - Abhaya M. Dandekar
- Plant Sciences Department, University of California,, Davis, CA, 95616, USA
| |
Collapse
|
39
|
Yan R, Adinolfi S, Iannuzzi C, Kelly G, Oregioni A, Martin S, Pastore A. Cluster and fold stability of E. coli ISC-type ferredoxin. PLoS One 2013; 8:e78948. [PMID: 24265733 PMCID: PMC3827102 DOI: 10.1371/journal.pone.0078948] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 09/22/2013] [Indexed: 11/25/2022] Open
Abstract
Iron-sulfur clusters are essential protein prosthetic groups that provide their redox potential to several different metabolic pathways. Formation of iron-sulfur clusters is assisted by a specialised machine that comprises, among other proteins, a ferredoxin. As a first step to elucidate the precise role of this protein in cluster assembly, we have studied the factors governing the stability and the dynamic properties of E. coli ferredoxin using different spectroscopic techniques. The cluster-loaded protein is monomeric and well structured with a flexible C-terminus but is highly oxygen sensitive so that it readily loses the cluster leading to an irreversible unfolding under aerobic conditions. This process is slowed down by reducing conditions and high ionic strengths. NMR relaxation experiments on the cluster-loaded protein also show that, once the cluster is in place, the protein forms a globular and relatively rigid domain. These data indicate that the presence of the iron-sulfur cluster is the switch between a functional and a non-functional state.
Collapse
Affiliation(s)
- Robert Yan
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| | - Salvatore Adinolfi
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| | - Clara Iannuzzi
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| | - Geoff Kelly
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| | - Alain Oregioni
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| | - Stephen Martin
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| | - Annalisa Pastore
- Division of Molecular Structure, National Institute for Medical Research of the Medical Research Council, London, United Kingdom
| |
Collapse
|
40
|
Chakraborty S, Venkatramani R, Rao BJ, Asgeirsson B, Dandekar AM. The electrostatic profile of consecutive Cβ atoms applied to protein structure quality assessment. F1000Res 2013; 2:243. [PMID: 25506420 PMCID: PMC4257144 DOI: 10.12688/f1000research.2-243.v3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/16/2014] [Indexed: 12/23/2022] Open
Abstract
The structure of a protein provides insight into its physiological interactions with other components of the cellular soup. Methods that predict putative structures from sequences typically yield multiple, closely-ranked possibilities. A critical component in the process is the model quality assessing program (MQAP), which selects the best candidate from this pool of structures. Here, we present a novel MQAP based on the physical properties of sidechain atoms. We propose a method for assessing the quality of protein structures based on the electrostatic potential difference (EPD) of Cβ atoms in consecutive residues. We demonstrate that the EPDs of Cβ atoms on consecutive residues provide unique signatures of the amino acid types. The EPD of Cβ atoms are learnt from a set of 1000 non-homologous protein structures with a resolution cuto of 1.6 Å obtained from the PISCES database. Based on the Boltzmann hypothesis that lower energy conformations are proportionately sampled more, and on Annsen's thermodynamic hypothesis that the native structure of a protein is the minimum free energy state, we hypothesize that the deviation of observed EPD values from the mean values obtained in the learning phase is minimized in the native structure. We achieved an average specificity of 0.91, 0.94 and 0.93 on hg_structal, 4state_reduced and ig_structal decoy sets, respectively, taken from the Decoys `R' Us database. The source code and manual is made available at
https://github.com/sanchak/mqap and permanently available on 10.5281/zenodo.7134.
Collapse
Affiliation(s)
- Sandeep Chakraborty
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, 400 005, India
| | - Ravindra Venkatramani
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Mumbai, 400 005, India
| | - Basuthkar J Rao
- Department of Biological Sciences, Tata Institute of Fundamental Research, Mumbai, 400 005, India
| | - Bjarni Asgeirsson
- Science Institute, Department of Biochemistry, University of Iceland, IS-107 Reykjavik, Iceland
| | - Abhaya M Dandekar
- Plant Sciences Department, University of California,, Davis, CA, 95616, USA
| |
Collapse
|
41
|
Fernández-Suárez XM, Galperin MY. The 2013 Nucleic Acids Research Database Issue and the online molecular biology database collection. Nucleic Acids Res 2012. [PMID: 23203983 PMCID: PMC3531151 DOI: 10.1093/nar/gks1297] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein–protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein–ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and currently lists 1512 online databases. The full content of the Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
Collapse
|