51
|
Lewis TE, Sillitoe I, Lees JG. cath-resolve-hits: a new tool that resolves domain matches suspiciously quickly. Bioinformatics 2020; 35:1766-1767. [PMID: 30295745 PMCID: PMC6513158 DOI: 10.1093/bioinformatics/bty863] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2018] [Revised: 09/18/2018] [Accepted: 10/05/2018] [Indexed: 11/26/2022] Open
Abstract
Motivation Many bioinformatics areas require us to assign domain matches onto stretches of a query protein. Starting with a set of candidate matches, we want to identify the optimal subset that has limited/no overlap between matches. This may be further complicated by discontinuous domains in the input data. Existing tools are increasingly facing very large data-sets for which they require prohibitive amounts of CPU-time and memory. Results We present cath-resolve-hits (CRH), a new tool that uses a dynamic-programming algorithm implemented in open-source C++ to handle large datasets quickly (up to ∼1 million hits/second) and in reasonable amounts of memory. It accepts multiple input formats and provides its output in plain text, JSON or graphical HTML. We describe a benchmark against an existing algorithm, which shows CRH delivers very similar or slightly improved results and very much improved CPU/memory performance on large datasets. Availability and implementation CRH is available at https://github.com/UCLOrengoGroup/cath-tools; documentation is available at http://cath-tools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- T E Lewis
- Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK
| | - I Sillitoe
- Department of Structural and Molecular Biology, UCL, Darwin Building, London, UK
| | - J G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, Oxfordshire, UK
| |
Collapse
|
52
|
Ramasamy P, Turan D, Tichshenko N, Hulstaert N, Vandermarliere E, Vranken W, Martens L. Scop3P: A Comprehensive Resource of Human Phosphosites within Their Full Context. J Proteome Res 2020; 19:3478-3486. [DOI: 10.1021/acs.jproteome.0c00306] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Pathmanaban Ramasamy
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
- Centre for Structural Biology, VIB, 1050 Brussels, Belgium
| | - Demet Turan
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Natalia Tichshenko
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Niels Hulstaert
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, 1050 Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
- Centre for Structural Biology, VIB, 1050 Brussels, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| |
Collapse
|
53
|
Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Kalro T, Liang Y, Lowe R, Namkoong H, Peisach E, Periskova I, Prlic A, Randle C, Rose A, Rose P, Sala R, Sekharan M, Shao C, Tan L, Tao YP, Valasatava Y, Voigt M, Westbrook J, Woo J, Yang H, Young J, Zhuravleva M, Zardecki C. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 2020; 47:D464-D474. [PMID: 30357411 PMCID: PMC6324064 DOI: 10.1093/nar/gky1004] [Citation(s) in RCA: 724] [Impact Index Per Article: 181.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/11/2018] [Indexed: 02/06/2023] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a ‘Structural View of Biology.’
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA.,Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Luigi Di Costanzo
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Cole Christie
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ken Dalenberg
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel K Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dmytro Guzenko
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Tara Kalro
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Harry Namkoong
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Periskova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Andreas Prlic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Chris Randle
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Alexander Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Peter Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Raul Sala
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Lihua Tan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yi-Ping Tao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Valasatava
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jesse Woo
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Huanwang Yang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Marina Zhuravleva
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
54
|
Dana JM, Gutmanas A, Tyagi N, Qi G, O'Donovan C, Martin M, Velankar S. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res 2020; 47:D482-D489. [PMID: 30445541 PMCID: PMC6324003 DOI: 10.1093/nar/gky1114] [Citation(s) in RCA: 127] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 10/22/2018] [Indexed: 12/12/2022] Open
Abstract
The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; http://pdbe.org/sifts/) was established in 2002 and continues to operate as a collaboration between the Protein Data Bank in Europe (PDBe; http://pdbe.org) and the UniProt Knowledgebase (UniProtKB; http://uniprot.org). The resource is instrumental in the transfer of annotations between protein structure and protein sequence resources through provision of up-to-date residue-level mappings between entries from the PDB and from UniProtKB. SIFTS also incorporates residue-level annotations from other biological resources, currently comprising the NCBI taxonomy database, IntEnz, GO, Pfam, InterPro, SCOP, CATH, PubMed, Ensembl, Homologene and automatic Pfam domain assignments based on HMM profiles. The recently released implementation of SIFTS includes support for multiple cross-references for proteins in the PDB, allowing mappings to UniProtKB isoforms and UniRef90 cluster members. This development makes structure data in the PDB readily available to over 1.8 million UniProtKB accessions.
Collapse
Affiliation(s)
- Jose M Dana
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aleksandras Gutmanas
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nidhi Tyagi
- Protein Function Development, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guoying Qi
- Protein Function Development, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire O'Donovan
- Metabolomics, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Maria Martin
- Protein Function Development, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
55
|
Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, Fish P, Harsha B, Hathaway C, Jupe SC, Kok CY, Noble K, Ponting L, Ramshaw CC, Rye CE, Speedy HE, Stefancsik R, Thompson SL, Wang S, Ward S, Campbell PJ, Forbes SA. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res 2020; 47:D941-D947. [PMID: 30371878 PMCID: PMC6323903 DOI: 10.1093/nar/gky1015] [Citation(s) in RCA: 2684] [Impact Index Per Article: 671.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/11/2018] [Indexed: 01/17/2023] Open
Abstract
COSMIC, the Catalogue Of Somatic Mutations In Cancer (https://cancer.sanger.ac.uk) is the most detailed and comprehensive resource for exploring the effect of somatic mutations in human cancer. The latest release, COSMIC v86 (August 2018), includes almost 6 million coding mutations across 1.4 million tumour samples, curated from over 26 000 publications. In addition to coding mutations, COSMIC covers all the genetic mechanisms by which somatic mutations promote cancer, including non-coding mutations, gene fusions, copy-number variants and drug-resistance mutations. COSMIC is primarily hand-curated, ensuring quality, accuracy and descriptive data capture. Building on our manual curation processes, we are introducing new initiatives that allow us to prioritize key genes and diseases, and to react more quickly and comprehensively to new findings in the literature. Alongside improvements to the public website and data-download systems, new functionality in COSMIC-3D allows exploration of mutations within three-dimensional protein structures, their protein structural and functional impacts, and implications for druggability. In parallel with COSMIC’s deep and broad variant coverage, the Cancer Gene Census (CGC) describes a curated catalogue of genes driving every form of human cancer. Currently describing 719 genes, the CGC has recently introduced functional descriptions of how each gene drives disease, summarized into the 10 cancer Hallmarks.
Collapse
Affiliation(s)
- John G Tate
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sally Bamford
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Harry C Jubb
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.,Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge CB4 0QA, UK
| | - Zbyslaw Sondka
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - David M Beare
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Nidhi Bindal
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Harry Boutselakis
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Charlotte G Cole
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Celestino Creatore
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Elisabeth Dawson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peter Fish
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Bhavana Harsha
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Charlie Hathaway
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Steve C Jupe
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Chai Yin Kok
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Kate Noble
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Laura Ponting
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | - Claire E Rye
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Helen E Speedy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ray Stefancsik
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sam L Thompson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Shicai Wang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sari Ward
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peter J Campbell
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Simon A Forbes
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
56
|
Singh G, Inoue A, Gutkind JS, Russell RB, Raimondi F. PRECOG: PREdicting COupling probabilities of G-protein coupled receptors. Nucleic Acids Res 2020; 47:W395-W401. [PMID: 31143927 PMCID: PMC6602504 DOI: 10.1093/nar/gkz392] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 04/13/2019] [Accepted: 05/01/2019] [Indexed: 01/08/2023] Open
Abstract
G-protein coupled receptors (GPCRs) control multiple physiological states by transducing a multitude of extracellular stimuli into the cell via coupling to intra-cellular heterotrimeric G-proteins. Deciphering which G-proteins couple to each of the hundreds of GPCRs present in a typical eukaryotic organism is therefore critical to understand signalling. Here, we present PRECOG (precog.russelllab.org): a web-server for predicting GPCR coupling, which allows users to: (i) predict coupling probabilities for GPCRs to individual G-proteins instead of subfamilies; (ii) visually inspect the protein sequence and structural features that are responsible for a particular coupling; (iii) suggest mutations to rationally design artificial GPCRs with new coupling properties based on predetermined coupling features.
Collapse
Affiliation(s)
- Gurdeep Singh
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany
| | - Asuka Inoue
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - J Silvio Gutkind
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Robert B Russell
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany
| | - Francesco Raimondi
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany
| |
Collapse
|
57
|
Metwally H, Tanaka T, Li S, Parajuli G, Kang S, Hanieh H, Hashimoto S, Chalise JP, Gemechu Y, Standley DM, Kishimoto T. Noncanonical STAT1 phosphorylation expands its transcriptional activity into promoting LPS-induced IL-6 and IL-12p40 production. Sci Signal 2020; 13:13/624/eaay0574. [DOI: 10.1126/scisignal.aay0574] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The lipopolysaccharide (LPS)–induced endocytosis of Toll-like receptor 4 (TLR4) is an essential step in the production of interferon-β (IFN-β), which activates the transcription of antiviral response genes by STAT1 phosphorylated at Tyr701. Here, we showed that STAT1 regulated proinflammatory cytokine production downstream of TLR4 endocytosis independently of IFN-β signaling and the key proinflammatory regulator NF-κB. In human macrophages, TLR4 endocytosis activated a noncanonical phosphorylation of STAT1 at Thr749, which subsequently promoted the production of interleukin-6 (IL-6) and IL-12p40 through distinct mechanisms. STAT1 phosphorylated at Thr749activated the expression of the gene encoding ARID5A, which stabilizesIL6mRNA. Moreover, STAT1 phosphorylated at Thr749directly enhanced transcription of the gene encoding IL-12p40 (IL12B). Instead of affecting STAT1 nuclear translocation, phosphorylation of Thr749facilitated the binding of STAT1 to a noncanonical DNA motif (5′-TTTGANNC-3′) in the promoter regions ofARID5AandIL12B. The endocytosis of TLR4 induced the formation of a complex between the kinases TBK1 and IKKβ, which mediated the phosphorylation of STAT1 at Thr749. Our data suggest that noncanonical phosphorylation in response to LPS confers STAT1 with distinct DNA binding and gene-regulatory properties that promote bothIL12Bexpression andIL6mRNA stabilization. Thus, our study provides a potential mechanism for how TLR4 endocytosis might regulate proinflammatory cytokine production.
Collapse
Affiliation(s)
- Hozaifa Metwally
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Toshio Tanaka
- Medical Affairs Bureau, Osaka Prefectural Hospital Organization, Osaka Habikino Medical Center, 583-8588 Osaka, Japan
| | - Songling Li
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 565-0871 Osaka, Japan
- Laboratory of Systems Immunology, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Gyanu Parajuli
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Sujin Kang
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Hamza Hanieh
- Department of Medical Analysis, Department of Biological Sciences, Al-Hussein Bin Talal University, 71111 Ma’an, Jordan
| | - Shigeru Hashimoto
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Jaya P. Chalise
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Yohannes Gemechu
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Daron M. Standley
- Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 565-0871 Osaka, Japan
- Laboratory of Systems Immunology, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| | - Tadamitsu Kishimoto
- Laboratory of Immune Regulation, Immunology Frontier Research Center, Osaka University, 565-0871 Osaka, Japan
| |
Collapse
|
58
|
Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020; 432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]
Abstract
The intricate details of how proteins bind to proteins, DNA, and RNA are crucial for the understanding of almost all biological processes. Disease-causing sequence variants often affect binding residues. Here, we described a new, comprehensive system of in silico methods that take only protein sequence as input to predict binding of protein to DNA, RNA, and other proteins. Firstly, we needed to develop several new methods to predict whether or not proteins bind (per-protein prediction). Secondly, we developed independent methods that predict which residues bind (per-residue). Not requiring three-dimensional information, the system can predict the actual binding residue. The system combined homology-based inference with machine learning and motif-based profile-kernel approaches with word-based (ProtVec) solutions to machine learning protein level predictions. This achieved an overall non-exclusive three-state accuracy of 77% ± 1% (±one standard error) corresponding to a 1.8 fold improvement over random (best classification for protein-protein with F1 = 91 ± 0.8%). Standard neural networks for per-residue binding residue predictions appeared best for DNA-binding (Q2 = 81 ± 0.9%) followed by RNA-binding (Q2 = 80 ± 1%) and worst for protein-protein binding (Q2 = 69 ± 0.8%). The new method, dubbed ProNA2020, is available as code through github (https://github.com/Rostlab/ProNA2020.git) and through PredictProtein (www.predictprotein.org).
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
| | - Michael Bernhofer
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Michael Heinzinger
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
| | - Sofie Kemper
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
| | - Tomas Norambuena
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Francisco Melo
- Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany
| |
Collapse
|
59
|
Feng SH, Zhang WX, Yang J, Yang Y, Shen HB. Topology Prediction Improvement of α-helical Transmembrane Proteins Through Helix-tail Modeling and Multiscale Deep Learning Fusion. J Mol Biol 2020; 432:1279-1296. [DOI: 10.1016/j.jmb.2019.12.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 12/02/2019] [Accepted: 12/04/2019] [Indexed: 12/18/2022]
|
60
|
Goodsell DS, Zardecki C, Di Costanzo L, Duarte JM, Hudson BP, Persikova I, Segura J, Shao C, Voigt M, Westbrook JD, Young JY, Burley SK. RCSB Protein Data Bank: Enabling biomedical research and drug discovery. Protein Sci 2020; 29:52-65. [PMID: 31531901 PMCID: PMC6933845 DOI: 10.1002/pro.3730] [Citation(s) in RCA: 186] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 09/09/2019] [Accepted: 09/10/2019] [Indexed: 12/11/2022]
Abstract
Analyses of publicly available structural data reveal interesting insights into the impact of the three-dimensional (3D) structures of protein targets important for discovery of new drugs (e.g., G-protein-coupled receptors, voltage-gated ion channels, ligand-gated ion channels, transporters, and E3 ubiquitin ligases). The Protein Data Bank (PDB) archive currently holds > 155,000 atomic-level 3D structures of biomolecules experimentally determined using crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy. The PDB was established in 1971 as the first open-access, digital-data resource in biology, and is now managed by the Worldwide PDB partnership (wwPDB; wwPDB.org). US PDB operations are the responsibility of the Research Collaboratory for Structural Bioinformatics PDB (RCSB PDB). The RCSB PDB serves millions of RCSB.org users worldwide by delivering PDB data integrated with ∼40 external biodata resources, providing rich structural views of fundamental biology, biomedicine, and energy sciences. Recently published work showed that the PDB archival holdings facilitated discovery of ∼90% of the 210 new drugs approved by the US Food and Drug Administration 2010-2016. We review user-driven development of RCSB PDB services, examine growth of the PDB archive in terms of size and complexity, and present examples and opportunities for structure-guided drug discovery for challenging targets (e.g., integral membrane proteins).
Collapse
Affiliation(s)
- David S. Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
- The Scripps Research InstituteLa JollaCalifornia
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Luigi Di Costanzo
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Jose M. Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaSan DiegoCalifornia
| | - Brian P. Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaSan DiegoCalifornia
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - John D. Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Jasmine Y. Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
| | - Stephen K. Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Institute for Quantitative Biomedicine, RutgersThe State University of New JerseyPiscatawayNew Jersey
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer CenterUniversity of CaliforniaSan DiegoCalifornia
- Rutgers Cancer Institute of New Jersey, RutgersThe State University of New JerseyNew BrunswickNew Jersey
| |
Collapse
|
61
|
Ryl PSJ, Bohlke-Schneider M, Lenz S, Fischer L, Budzinski L, Stuiver M, Mendes MML, Sinn L, O'Reilly FJ, Rappsilber J. In Situ Structural Restraints from Cross-Linking Mass Spectrometry in Human Mitochondria. J Proteome Res 2019; 19:327-336. [PMID: 31746214 PMCID: PMC7010328 DOI: 10.1021/acs.jproteome.9b00541] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The field of structural biology is increasingly focusing on studying proteins in situ, i.e., in their greater biological context. Cross-linking mass spectrometry (CLMS) is contributing to this effort, typically through the use of mass spectrometry (MS)-cleavable cross-linkers. Here, we apply the popular noncleavable cross-linker disuccinimidyl suberate (DSS) to human mitochondria and identify 5518 distance restraints between protein residues. Each distance restraint on proteins or their interactions provides structural information within mitochondria. Comparing these restraints to protein data bank (PDB)-deposited structures and comparative models reveals novel protein conformations. Our data suggest, among others, substrates and protein flexibility of mitochondrial heat shock proteins. Through this study, we bring forward two central points for the progression of CLMS towards large-scale in situ structural biology: First, clustered conflicts of cross-link data reveal in situ protein conformation states in contrast to error-rich individual conflicts. Second, noncleavable cross-linkers are compatible with proteome-wide studies.
Collapse
Affiliation(s)
- Petra S J Ryl
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Michael Bohlke-Schneider
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Swantje Lenz
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Lutz Fischer
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany.,Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH9 3BF , Scotland , United Kingdom
| | - Lisa Budzinski
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Marchel Stuiver
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Marta M L Mendes
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Ludwig Sinn
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Francis J O'Reilly
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany.,Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH9 3BF , Scotland , United Kingdom
| |
Collapse
|
62
|
Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019; 20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 228] [Impact Index Per Article: 45.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here. RESULTS We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis. CONCLUSION Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.
Collapse
Affiliation(s)
- Michael Heinzinger
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Ahmed Elnaggar
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Yu Wang
- Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching/Munich, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Dmitrii Nechaev
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Florian Matthes
- TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
63
|
Ghadermarzi S, Li X, Li M, Kurgan L. Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins. Front Genet 2019; 10:1075. [PMID: 31803227 PMCID: PMC6872670 DOI: 10.3389/fgene.2019.01075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open
Abstract
Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug–protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.
Collapse
Affiliation(s)
- Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
64
|
Da Silva F, Bret G, Teixeira L, Gonzalez CF, Rognan D. Exhaustive Repertoire of Druggable Cavities at Protein-Protein Interfaces of Known Three-Dimensional Structure. J Med Chem 2019; 62:9732-9742. [PMID: 31603323 DOI: 10.1021/acs.jmedchem.9b01184] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Protein-protein interactions (PPIs) offer the unique opportunity to tailor ligands aimed at specifically stabilizing or disrupting the corresponding interfaces and providing a safer alternative to conventional ligands targeting monomeric macromolecules. Selecting biologically relevant protein-protein interfaces for either stabilization or disruption by small molecules is usually biology-driven on a case-by-case basis and does not follow a structural rationale that could be applied to an entire interactome. We herewith provide a first step to the latter goal by using a fully automated and structure-based workflow, applicable to any PPI of known three-dimensional (3D) structure, to identify and prioritize druggable cavities at and nearby PPIs of pharmacological interest. When applied to the entire Protein Data Bank, 164 514 druggable cavities were identified and classified in four groups (interfacial, rim, allosteric, orthosteric) according to their properties and spatial locations. Systematic comparison of PPI cavities with pockets deduced from druggable protein-ligand complexes shows almost no overlap in property space, suggesting that even the most druggable PPI cavities are unlikely to be addressed with conventional drug-like compound libraries. The archive is freely accessible at http://drugdesign.unistra.fr/ppiome .
Collapse
Affiliation(s)
- Franck Da Silva
- Laboratoire d'Innovation Thérapeutique , UMR 7200 CNRS-Université de Strasbourg , 67400 Illkirch , France
| | - Guillaume Bret
- Laboratoire d'Innovation Thérapeutique , UMR 7200 CNRS-Université de Strasbourg , 67400 Illkirch , France
| | - Leandro Teixeira
- Department of Microbiology and Cell Science, Genetics Institute, Institute of Food and Agricultural Sciences , University of Florida , Gainesville , Florida 32610-3610 , United States
| | - Claudio F Gonzalez
- Department of Microbiology and Cell Science, Genetics Institute, Institute of Food and Agricultural Sciences , University of Florida , Gainesville , Florida 32610-3610 , United States
| | - Didier Rognan
- Laboratoire d'Innovation Thérapeutique , UMR 7200 CNRS-Université de Strasbourg , 67400 Illkirch , France
| |
Collapse
|
65
|
Hoksza D, Gawron P, Ostaszewski M, Schneider R. MolArt: a molecular structure annotation and visualization tool. Bioinformatics 2019; 34:4127-4128. [PMID: 29931246 PMCID: PMC6247942 DOI: 10.1093/bioinformatics/bty489] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 06/13/2018] [Indexed: 11/16/2022] Open
Abstract
Summary MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool.
Collapse
Affiliation(s)
- David Hoksza
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University Malostranské nám, 118 00 Prague, Czech Republic
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
66
|
Bordin N, Devos DP. ICBdocker: a Docker image for proteome annotation and visualization. Bioinformatics 2019; 34:3937-3938. [PMID: 29931249 DOI: 10.1093/bioinformatics/bty493] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 06/15/2018] [Indexed: 12/14/2022] Open
Abstract
Summary We introduce ICBdocker, a Docker environment that allows the annotation of functional and structural features of proteomes through a Python/Perl pipeline. DataTables pages make it easy to set up a web-resource for research groups with a focus on the same organisms or datasets. The results are available as tab-separated values files and HTML, allowing data analysis and browsing. The pipeline focuses on modularity and scalability, with capability of integrating with multi-processing and high-performance computing clusters. Availability and implementation ICBdocker is freely available on DockerHub at https://hub.docker.com/r/bordin89/icb/ Source code and documentation are available on GitHub at: https://github.com/bordin89/ICB_docker.
Collapse
Affiliation(s)
- Nicola Bordin
- Laboratory of Evolutionary Innovations, Centro Andaluz de Biologia del Desarollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera, Seville, Spain
| | - Damien P Devos
- Laboratory of Evolutionary Innovations, Centro Andaluz de Biologia del Desarollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera, Seville, Spain
| |
Collapse
|
67
|
Lu H, Li F, Sánchez BJ, Zhu Z, Li G, Domenzain I, Marcišauskas S, Anton PM, Lappa D, Lieven C, Beber ME, Sonnenschein N, Kerkhoven EJ, Nielsen J. A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nat Commun 2019; 10:3586. [PMID: 31395883 PMCID: PMC6687777 DOI: 10.1038/s41467-019-11581-3] [Citation(s) in RCA: 154] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 07/17/2019] [Indexed: 01/06/2023] Open
Abstract
Genome-scale metabolic models (GEMs) represent extensive knowledgebases that provide a platform for model simulations and integrative analysis of omics data. This study introduces Yeast8 and an associated ecosystem of models that represent a comprehensive computational resource for performing simulations of the metabolism of Saccharomyces cerevisiae--an important model organism and widely used cell-factory. Yeast8 tracks community development with version control, setting a standard for how GEMs can be continuously updated in a simple and reproducible way. We use Yeast8 to develop the derived models panYeast8 and coreYeast8, which in turn enable the reconstruction of GEMs for 1,011 different yeast strains. Through integration with enzyme constraints (ecYeast8) and protein 3D structures (proYeast8DB), Yeast8 further facilitates the exploration of yeast metabolism at a multi-scale level, enabling prediction of how single nucleotide variations translate to phenotypic traits.
Collapse
Affiliation(s)
- Hongzhong Lu
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Feiran Li
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Benjamín J Sánchez
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Zhengming Zhu
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
- School of Biotechnology, Jiangnan University, 1800 Lihu Road, 214122, Wuxi, Jiangsu, China
| | - Gang Li
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Iván Domenzain
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Simonas Marcišauskas
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Petre Mihail Anton
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Dimitra Lappa
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Christian Lieven
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Moritz Emanuel Beber
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Nikolaus Sonnenschein
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark
| | - Eduard J Kerkhoven
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE412 96, Gothenburg, Sweden.
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2800 Kgs, Lyngby, Denmark.
- BioInnovation Institute, Ole Maaløes Vej 3, DK2200, Copenhagen N, Denmark.
| |
Collapse
|
68
|
Pravda L, Sehnal D, Toušek D, Navrátilová V, Bazgier V, Berka K, Svobodová Vareková R, Koca J, Otyepka M. MOLEonline: a web-based tool for analyzing channels, tunnels and pores (2018 update). Nucleic Acids Res 2019; 46:W368-W373. [PMID: 29718451 PMCID: PMC6030847 DOI: 10.1093/nar/gky309] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 04/12/2018] [Indexed: 12/27/2022] Open
Abstract
MOLEonline is an interactive, web-based application for the detection and characterization of channels (pores and tunnels) within biomacromolecular structures. The updated version of MOLEonline overcomes limitations of the previous version by incorporating the recently developed LiteMol Viewer visualization engine and providing a simple, fully interactive user experience. The application enables two modes of calculation: one is dedicated to the analysis of channels while the other was specifically designed for transmembrane pores. As the application can use both PDB and mmCIF formats, it can be leveraged to analyze a wide spectrum of biomacromolecular structures, e.g. stemming from NMR, X-ray and cryo-EM techniques. The tool is interconnected with other bioinformatics tools (e.g., PDBe, CSA, ChannelsDB, OPM, UniProt) to help both setup and the analysis of acquired results. MOLEonline provides unprecedented analytics for the detection and structural characterization of channels, as well as information about their numerous physicochemical features. Here we present the application of MOLEonline for structural analyses of α-hemolysin and transient receptor potential mucolipin 1 (TRMP1) pores. The MOLEonline application is freely available via the Internet at https://mole.upol.cz.
Collapse
Affiliation(s)
- Lukáš Pravda
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - David Sehnal
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Dominik Toušek
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic.,Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University, tř. 17. Listopadu 12, 771 46 Olomouc, Czech Republic
| | - Veronika Navrátilová
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University, tr. 17. Listopadu 12, 771 46 Olomouc, Czech Republic
| | - Václav Bazgier
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University, tr. 17. Listopadu 12, 771 46 Olomouc, Czech Republic
| | - Karel Berka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University, tr. 17. Listopadu 12, 771 46 Olomouc, Czech Republic
| | - Radka Svobodová Vareková
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Jaroslav Koca
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic
| | - Michal Otyepka
- Regional Centre of Advanced Technologies and Materials, Department of Physical Chemistry, Faculty of Science, Palacký University, tr. 17. Listopadu 12, 771 46 Olomouc, Czech Republic
| |
Collapse
|
69
|
Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T, Lee D, Orengo C, Lees J. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res 2019; 46:D435-D439. [PMID: 29112716 PMCID: PMC5753370 DOI: 10.1093/nar/gkx1069] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 11/28/2022] Open
Abstract
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.
Collapse
Affiliation(s)
- Tony E Lewis
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Natalie Dawson
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Su Datt Lam
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK.,School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Tristan Clarke
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - David Lee
- Bristol Life Sciences Building, University of Bristol, Bristol Life Sciences Building, Bristol, BS8 1TQ, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Jonathan Lees
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK.,Oxford Brookes University, Faculty of Health and Life Sciences, Oxford, Oxfordshire, UK
| |
Collapse
|
70
|
Verma R, Pandit SB. Unraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactions. PLoS One 2019; 14:e0220336. [PMID: 31374091 PMCID: PMC6677297 DOI: 10.1371/journal.pone.0220336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/12/2019] [Indexed: 12/22/2022] Open
Abstract
Intra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs. Even though interfaces of protein-protein interactions (PPIs) have been previously studied by structurally aligning interfaces, similar analyses have not yet been performed on DDIs of either multidomain proteins or PPIs. For studying the structural landscape of DDIs, we have used iAlign to structurally align intra-chain domain interfaces of domains. The interface alignment of spatially constrained domains (due to inter-domain linkers) showed that ~88% of these could identify a structural matching interface having similar C-alpha geometry and contact pattern despite that aligned domain pairs are not structurally related. Moreover, the mean interface similarity score (IS-score) is 0.307, which is higher compared to the average random IS-score (0.207) suggesting domain interfaces are not random. The structural space of DDIs is highly connected as ~84% of all possible directed edges among interfaces are found to have at most path length of 8 when 0.26 is IS-score threshold. At this threshold, ~83% of interfaces form the largest strongly connected component. Thus, suggesting that structural space of intra-chain domain interfaces is degenerate and highly connected, as has been found in PPI interfaces. Interestingly, searching for structural neighbors of inter-chain interfaces among intra-chain interfaces showed that ~86% could find a statistically significant match to intra-chain interface with a mean IS-score of 0.311. This implies that domain interfaces are degenerate whether formed within a protein or between proteins. The interface degeneracy is most likely due to limited possible ways of packing secondary structures. In principle, interface similarities can be exploited to accurately model domain interfaces in structure prediction of multidomain proteins.
Collapse
Affiliation(s)
- Rivi Verma
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
- * E-mail:
| |
Collapse
|
71
|
Wang J, Sheridan R, Sumer SO, Schultz N, Xu D, Gao J. G2S: a web-service for annotating genomic variants on 3D protein structures. Bioinformatics 2019; 34:1949-1950. [PMID: 29385402 DOI: 10.1093/bioinformatics/bty047] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Accepted: 01/26/2018] [Indexed: 11/14/2022] Open
Abstract
Motivation Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. Results We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. Availability and implementation The webserver and source codes are freely available at https://g2s.genomenexus.org. Contact g2s@genomenexus.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Juexin Wang
- Department of Electrical Engineering & Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Robert Sheridan
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - S Onur Sumer
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nikolaus Schultz
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Department of Epidemiology-Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Dong Xu
- Department of Electrical Engineering & Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Jianjiong Gao
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| |
Collapse
|
72
|
Scheibenreif L, Littmann M, Orengo C, Rost B. FunFam protein families improve residue level molecular function prediction. BMC Bioinformatics 2019; 20:400. [PMID: 31319797 PMCID: PMC6639920 DOI: 10.1186/s12859-019-2988-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/09/2019] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.
Collapse
Affiliation(s)
- Linus Scheibenreif
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Maria Littmann
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY 10032, USA
| |
Collapse
|
73
|
Profiti G, Martelli PL, Casadio R. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation. Nucleic Acids Res 2019; 45:W285-W290. [PMID: 28453653 PMCID: PMC5570247 DOI: 10.1093/nar/gkx330] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/18/2017] [Indexed: 01/03/2023] Open
Abstract
BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3.
Collapse
Affiliation(s)
- Giuseppe Profiti
- Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy
| | - Rita Casadio
- Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy
| |
Collapse
|
74
|
Schaefer M, Clevert DA, Weiss B, Steffen A. PAVOOC: designing CRISPR sgRNAs using 3D protein structures and functional domain annotations. Bioinformatics 2019; 35:2309-2310. [PMID: 30445568 PMCID: PMC6596878 DOI: 10.1093/bioinformatics/bty935] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 10/01/2018] [Accepted: 11/09/2018] [Indexed: 12/26/2022] Open
Abstract
SUMMARY Single-guide RNAs (sgRNAs) targeting the same gene can significantly vary in terms of efficacy and specificity. PAVOOC (Prediction And Visualization of On- and Off-targets for CRISPR) is a web-based CRISPR sgRNA design tool that employs state of the art machine learning models to prioritize most effective candidate sgRNAs. In contrast to other tools, it maps sgRNAs to functional domains and protein structures and visualizes cut sites on corresponding protein crystal structures. Furthermore, PAVOOC supports homology-directed repair template generation for genome editing experiments and the visualization of the mutated amino acids in 3D. AVAILABILITY AND IMPLEMENTATION PAVOOC is available under https://pavooc.me and accessible using modern browsers (Chrome/Chromium recommended). The source code is hosted at github.com/moritzschaefer/pavooc under the MIT License. The backend, including data processing steps, and the frontend are implemented in Python 3 and ReactJS, respectively. All components run in a simple Docker environment. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
75
|
Konc J, Skrlj B, Erzen N, Kunej T, Janezic D. GenProBiS: web server for mapping of sequence variants to protein binding sites. Nucleic Acids Res 2019; 45:W253-W259. [PMID: 28498966 PMCID: PMC5570222 DOI: 10.1093/nar/gkx420] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 05/02/2017] [Indexed: 02/02/2023] Open
Abstract
Discovery of potentially deleterious sequence variants is important and has wide implications for research and generation of new hypotheses in human and veterinary medicine, and drug discovery. The GenProBiS web server maps sequence variants to protein structures from the Protein Data Bank (PDB), and further to protein–protein, protein–nucleic acid, protein–compound, and protein–metal ion binding sites. The concept of a protein–compound binding site is understood in the broadest sense, which includes glycosylation and other post-translational modification sites. Binding sites were defined by local structural comparisons of whole protein structures using the Protein Binding Sites (ProBiS) algorithm and transposition of ligands from the similar binding sites found to the query protein using the ProBiS-ligands approach with new improvements introduced in GenProBiS. Binding site surfaces were generated as three-dimensional grids encompassing the space occupied by predicted ligands. The server allows intuitive visual exploration of comprehensively mapped variants, such as human somatic mis-sense mutations related to cancer and non-synonymous single nucleotide polymorphisms from 21 species, within the predicted binding sites regions for about 80 000 PDB protein structures using fast WebGL graphics. The GenProBiS web server is open and free to all users at http://genprobis.insilab.org.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia.,University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, 6000 Koper, Slovenia
| | - Blaz Skrlj
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Nika Erzen
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Tanja Kunej
- Biotechnical Faculty, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Dusanka Janezic
- University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, 6000 Koper, Slovenia
| |
Collapse
|
76
|
Inoue A, Raimondi F, Kadji FMN, Singh G, Kishi T, Uwamizu A, Ono Y, Shinjo Y, Ishida S, Arang N, Kawakami K, Gutkind JS, Aoki J, Russell RB. Illuminating G-Protein-Coupling Selectivity of GPCRs. Cell 2019; 177:1933-1947.e25. [PMID: 31160049 DOI: 10.1016/j.cell.2019.04.044] [Citation(s) in RCA: 318] [Impact Index Per Article: 63.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 01/28/2019] [Accepted: 04/25/2019] [Indexed: 12/20/2022]
Abstract
Heterotrimetic G proteins consist of four subfamilies (Gs, Gi/o, Gq/11, and G12/13) that mediate signaling via G-protein-coupled receptors (GPCRs), principally by receptors binding Gα C termini. G-protein-coupling profiles govern GPCR-induced cellular responses, yet receptor sequence selectivity determinants remain elusive. Here, we systematically quantified ligand-induced interactions between 148 GPCRs and all 11 unique Gα subunit C termini. For each receptor, we probed chimeric Gα subunit activation via a transforming growth factor-α (TGF-α) shedding response in HEK293 cells lacking endogenous Gq/11 and G12/13 proteins, and complemented G-protein-coupling profiles through a NanoBiT-G-protein dissociation assay. Interrogation of the dataset identified sequence-based coupling specificity features, inside and outside the transmembrane domain, which we used to develop a coupling predictor that outperforms previous methods. We used the predictor to engineer designer GPCRs selectively coupled to G12. This dataset of fine-tuned signaling mechanisms for diverse GPCRs is a valuable resource for research in GPCR signaling.
Collapse
Affiliation(s)
- Asuka Inoue
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan; Advanced Research & Development Programs for Medical Innovation (PRIME), Japan Agency for Medical Research and Development (AMED), Chiyoda-ku, Tokyo 100-0004, Japan; Advanced Research & Development Programs for Medical Innovation (LEAP), AMED, Chiyoda-ku, Tokyo 100-0004, Japan.
| | - Francesco Raimondi
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany; Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany.
| | | | - Gurdeep Singh
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany; Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany
| | - Takayuki Kishi
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - Akiharu Uwamizu
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - Yuki Ono
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - Yuji Shinjo
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - Satoru Ishida
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - Nadia Arang
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Kouki Kawakami
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan
| | - J Silvio Gutkind
- Department of Pharmacology and Moores Cancer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Junken Aoki
- Graduate School of Pharmaceutical Sciences, Tohoku University, Sendai, Miyagi 980-8578, Japan; Advanced Research & Development Programs for Medical Innovation (LEAP), AMED, Chiyoda-ku, Tokyo 100-0004, Japan
| | - Robert B Russell
- CellNetworks, Bioquant, Heidelberg University, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany; Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany.
| |
Collapse
|
77
|
Functional and Structural Features of Disease-Related Protein Variants. Int J Mol Sci 2019; 20:ijms20071530. [PMID: 30934684 PMCID: PMC6479756 DOI: 10.3390/ijms20071530] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 03/22/2019] [Accepted: 03/22/2019] [Indexed: 12/28/2022] Open
Abstract
Modern sequencing technologies provide an unprecedented amount of data of single-nucleotide variations occurring in coding regions and leading to changes in the expressed protein sequences. A significant fraction of these single-residue variations is linked to disease onset and collected in public databases. In recent years, many scientific studies have been focusing on the dissection of salient features of disease-related variations from different perspectives. In this work, we complement previous analyses by updating a dataset of disease-related variations occurring in proteins with 3D structure. Within this dataset, we describe functional and structural features that can be of interest for characterizing disease-related variations, including major chemico-physical properties, the strength of association to disease of variation types, their effect on protein stability, their location on the protein structure, and their distribution in Pfam structural/functional protein models. Our results support previous findings obtained in different data sets and introduce Pfam models as possible fingerprints of patterns of disease related single-nucleotide variations.
Collapse
|
78
|
Sayılgan JF, Haliloğlu T, Gönen M. Protein dynamics analysis reveals that missense mutations in cancer‐related genes appear frequently on hinge‐neighboring residues. Proteins 2019; 87:512-519. [DOI: 10.1002/prot.25673] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 01/09/2019] [Accepted: 02/17/2019] [Indexed: 01/26/2023]
Affiliation(s)
- Jan Fehmi Sayılgan
- Graduate School of Sciences and EngineeringKoç University İstanbul Turkey
| | - Türkan Haliloğlu
- Department of Chemical Engineering, School of EngineeringBoğaziçi University İstanbul Turkey
- Polymer Research CenterBoğaziçi University İstanbul Turkey
| | - Mehmet Gönen
- Department of Industrial Engineering, College of EngineeringKoç University İstanbul Turkey
- School of MedicineKoç University İstanbul Turkey
- Department of Biomedical Engineering, School of MedicineOregon Health and Science University Portland Oregon
| |
Collapse
|
79
|
Kulandaisamy A, Priya SB, Sakthivel R, Frishman D, Gromiha MM. Statistical analysis of disease‐causing and neutral mutations in human membrane proteins. Proteins 2019; 87:452-466. [DOI: 10.1002/prot.25667] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 01/16/2019] [Accepted: 01/31/2019] [Indexed: 11/11/2022]
Affiliation(s)
- A. Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology Madras Chennai Tamil Nadu India
| | - S. Binny Priya
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology Madras Chennai Tamil Nadu India
| | - R. Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology Madras Chennai Tamil Nadu India
| | - Dmitrij Frishman
- Department of BioinformaticsPeter the Great St. Petersburg Polytechnic University St. Petersburg Russian Federation
- Department of BioinformaticsTechnische Universität München, Wissenschaftszentrum Weihenstephan Freising Germany
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BiosciencesIndian Institute of Technology Madras Chennai Tamil Nadu India
- Advanced Computational Drug Discovery Unit (ACDD)Institute of Innovative Research, Tokyo Institute of Technology Yokohama Kanagawa Japan
| |
Collapse
|
80
|
Ganesan K, Kulandaisamy A, Binny Priya S, Gromiha MM. HuVarBase: A human variant database with comprehensive information at gene and protein levels. PLoS One 2019; 14:e0210475. [PMID: 30703169 PMCID: PMC6354970 DOI: 10.1371/journal.pone.0210475] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 12/25/2018] [Indexed: 12/20/2022] Open
Abstract
Human variant databases could be better exploited if the variant data available in multiple resources is integrated in a single comprehensive resource along with sequence and structural features. Such integration would improve the analyses of variants for disease prediction, prevention or treatment. The HuVarBase (HUmanVARiantdataBASE) assimilates publicly available human variant data at protein level and gene level into a comprehensive resource. Protein level data such as amino acid sequence, secondary structure of the mutant residue, domain, function, subcellular location and post-translational modification are integrated with gene level data such as gene name, chromosome number & genome position, DNA mutation, mutation type origin and rs ID number. Disease class has been added for the disease causing variants. The database is publicly available at https://www.iitm.ac.in/bioinfo/huvarbase. A total of 774,863 variant records, integrated in the HuVarBase, can be searched with options to display, visualize and download the results.
Collapse
Affiliation(s)
- Kaliappan Ganesan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, India
- * E-mail: (MMG); (KG)
| | - A. Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, India
| | - S. Binny Priya
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, India
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, India
- Advanced Computational Drug Discovery Unit (ACDD), Institute of Innovative Research, Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, Japan
- * E-mail: (MMG); (KG)
| |
Collapse
|
81
|
Litfin T, Yang Y, Zhou Y. SPOT-Peptide: Template-Based Prediction of Peptide-Binding Proteins and Peptide-Binding Sites. J Chem Inf Model 2019; 59:924-930. [DOI: 10.1021/acs.jcim.8b00777] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Thomas Litfin
- School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong 510006, China
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| |
Collapse
|
82
|
Munk C, Mutt E, Isberg V, Nikolajsen LF, Bibbe JM, Flock T, Hanson MA, Stevens RC, Deupi X, Gloriam DE. An online resource for GPCR structure determination and analysis. Nat Methods 2019; 16:151-162. [PMID: 30664776 DOI: 10.1038/s41592-018-0302-x] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 12/14/2018] [Indexed: 01/08/2023]
Abstract
G-protein-coupled receptors (GPCRs) transduce physiological and sensory stimuli into appropriate cellular responses and mediate the actions of one-third of drugs. GPCR structural studies have revealed the general bases of receptor activation, signaling, drug action and allosteric modulation, but so far cover only 13% of nonolfactory receptors. We broadly surveyed the receptor modifications/engineering and methods used to produce all available GPCR crystal and cryo-electron microscopy (cryo-EM) structures, and present an interactive resource integrated in GPCRdb ( http://www.gpcrdb.org ) to assist users in designing constructs and browsing appropriate experimental conditions for structure studies.
Collapse
Affiliation(s)
- Christian Munk
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark.
| | - Eshita Mutt
- Paul Scherrer Institute, Villigen, Switzerland
| | - Vignir Isberg
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark.,Novozymes A/S, Copenhagen, Denmark
| | - Louise F Nikolajsen
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Janne M Bibbe
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Raymond C Stevens
- Departments of Biological Sciences and Chemistry, Bridge Institute, USC Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA, USA.,iHuman Institute, ShanghaiTech University, Shanghai, China
| | | | - David E Gloriam
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
83
|
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJ, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SC, Yong SY, Finn RD. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res 2019; 47:D351-D360. [PMID: 30398656 PMCID: PMC6323941 DOI: 10.1093/nar/gky1100] [Citation(s) in RCA: 980] [Impact Index Per Article: 196.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 10/19/2018] [Accepted: 10/22/2018] [Indexed: 12/15/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
Collapse
Affiliation(s)
- Alex L Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Teresa K Attwood
- School of Computer Science, The University of Manchester, Manchester M13 9PL, UK
| | - Patricia C Babbitt
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
| | - Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstr.1, 69117 Heidelberg, Germany
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Shoshana D Brown
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sara El-Gebali
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew I Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - David R Haft
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aurélien Luciani
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fabio Madeira
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
- Department of Agricultural Sciences, University of Udine, via Palladio 8, 33100 Udine, Italy
- Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all’Adige, Italy
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christine Orengo
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matloob A Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Catherine Rivoire
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christian J A Sigrist
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ian Sillitoe
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Granger G Sutton
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Siew-Yit Yong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
84
|
Abstract
The native state of proteins is composed of conformers in dynamical equilibrium. In this chapter, different issues related to conformational diversity are explored using a curated and experimentally based database called CoDNaS (Conformational Diversity in the Native State). This database is a collection of redundant structures for the same sequence. CoDNaS estimates the degree of conformational diversity using different global and local structural similarity measures. It allows the user to explore how structural differences among conformers change as a function of several structural features providing further biological information. This chapter explores the measurement of conformational diversity and its relationship with sequence divergence. Also, it discusses how proteins with high conformational diversity could affect homology modeling techniques.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina
| | - Diego Javier Zea
- Structural Bioinformatics Unit, Fundación Instituto Leloir, CONICET, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Argentina.
| |
Collapse
|
85
|
Midlik A, Hutařová Vařeková I, Hutař J, Moturu TR, Navrátilová V, Koča J, Berka K, Svobodová Vařeková R. Automated Family-Wide Annotation of Secondary Structure Elements. Methods Mol Biol 2019; 1958:47-71. [PMID: 30945213 DOI: 10.1007/978-1-4939-9161-7_3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Secondary structure elements (SSEs) are inherent parts of protein structures, and their arrangement is characteristic for each protein family. Therefore, annotation of SSEs can facilitate orientation in the vast number of homologous structures which is now available for many protein families. It also provides a way to identify and annotate the key regions, like active sites and channels, and subsequently answer the key research questions, such as understanding of molecular function and its variability.This chapter introduces the concept of SSE annotation and describes the workflow for obtaining SSE annotation for the members of a selected protein family using program SecStrAnnotator.
Collapse
Affiliation(s)
- Adam Midlik
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic.
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic.
| | - Ivana Hutařová Vařeková
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Jan Hutař
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic
| | - Taraka Ramji Moturu
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Veronika Navrátilová
- Faculty of Science, Department of Physical Chemistry, Regional Centre of Advanced Technologies and Materials, Palacký University, Olomouc, Czech Republic
| | - Jaroslav Koča
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic
| | - Karel Berka
- Faculty of Science, Department of Physical Chemistry, Regional Centre of Advanced Technologies and Materials, Palacký University, Olomouc, Czech Republic
| | - Radka Svobodová Vařeková
- CEITEC-Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Faculty of Science, National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic
| |
Collapse
|
86
|
Wagih O, Galardini M, Busby BP, Memon D, Typas A, Beltrao P. A resource of variant effect predictions of single nucleotide variants in model organisms. Mol Syst Biol 2018; 14:e8430. [PMID: 30573687 PMCID: PMC6301329 DOI: 10.15252/msb.20188430] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 11/19/2018] [Accepted: 11/21/2018] [Indexed: 12/18/2022] Open
Abstract
The effect of single nucleotide variants (SNVs) in coding and noncoding regions is of great interest in genetics. Although many computational methods aim to elucidate the effects of SNVs on cellular mechanisms, it is not straightforward to comprehensively cover different molecular effects. To address this, we compiled and benchmarked sequence and structure-based variant effect predictors and we computed the impact of nearly all possible amino acid and nucleotide variants in the reference genomes of Homo sapiens, Saccharomyces cerevisiae and Escherichia coli Studied mechanisms include protein stability, interaction interfaces, post-translational modifications and transcription factor binding sites. We apply this resource to the study of natural and disease coding variants. We also show how variant effects can be aggregated to generate protein complex burden scores that uncover protein complex to phenotype associations based on a set of newly generated growth profiles of 93 sequenced S. cerevisiae strains in 43 conditions. This resource is available through mutfunc (www.mutfunc.com), a tool by which users can query precomputed predictions by providing amino acid or nucleotide-level variants.
Collapse
Affiliation(s)
- Omar Wagih
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Marco Galardini
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Bede P Busby
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Danish Memon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Athanasios Typas
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| |
Collapse
|
87
|
Alborzi SZ, Ritchie DW, Devignes MD. Computational discovery of direct associations between GO terms and protein domains. BMC Bioinformatics 2018; 19:413. [PMID: 30453875 PMCID: PMC6245584 DOI: 10.1186/s12859-018-2380-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. RESULTS We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach "CODAC" (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe "GODomainMiner" for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. CONCLUSIONS These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.
Collapse
Affiliation(s)
| | - David W Ritchie
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-54500, France
| | | |
Collapse
|
88
|
Zhang Q, Sahana G, Su G, Guldbrandtsen B, Lund MS, Calus MPL. Impact of rare and low-frequency sequence variants on reliability of genomic prediction in dairy cattle. Genet Sel Evol 2018; 50:62. [PMID: 30458700 PMCID: PMC6247626 DOI: 10.1186/s12711-018-0432-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 11/14/2018] [Indexed: 11/05/2022] Open
Abstract
Background Availability of whole-genome sequence data for a large number of cattle and efficient imputation methodologies open a new opportunity to include rare and low-frequency variants (RLFV) in genomic prediction in dairy cattle. The objective of this study was to examine the impact of including RLFV that are within genes and selected from whole-genome sequence variants, on the reliability of genomic prediction for fertility, health and longevity in dairy cattle. Results All genic RLFV with a minor allele frequency lower than 0.05 were extracted from imputed sequence data and subsets were created using different strategies. These subsets were subsequently combined with Illumina 50 k single nucleotide polymorphism (SNP) data and used for genomic prediction. Reliability of prediction obtained by using 50 k SNP data alone was used as reference value and absolute changes in reliabilities are referred to as changes in percentage points. Adding a component that included either all the genic or a subset of selected RLFV into the model in addition to the 50 k component changed the reliability of predictions by − 2.2 to 1.1%, i.e. hardly no change in reliability of prediction was found, regardless of how the RLFV were selected. In addition to these empirical analyses, a simulation study was performed to evaluate the potential impact of adding RLFV in the model on the reliability of prediction. Three sets of causal RLFV (containing 21,468, 1348 and 235 RLFV) that were randomly selected from different numbers of genes were generated and accounted for 10% additional genetic variance of the estimated variance explained by the 50 k SNPs. When genic RLFV based on mapping results were included in the prediction model, reliabilities improved by up to 4.0% and when the causal RLFV were included they improved by up to 6.8%. Conclusions Using selected RLFV from whole-genome sequence data had only a small impact on the empirical reliability of genomic prediction in dairy cattle. Our simulations revealed that for sequence data to bring a benefit, the key is to identify causal RLFV. Electronic supplementary material The online version of this article (10.1186/s12711-018-0432-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark. .,Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands. .,Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Guosheng Su
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, Denmark
| | - Mario P L Calus
- Wageningen University and Research, Animal Breeding and Genomics, Wageningen, The Netherlands
| |
Collapse
|
89
|
Bittrich S, Schroeder M, Labudde D. Characterizing the relation of functional and Early Folding Residues in protein structures using the example of aminoacyl-tRNA synthetases. PLoS One 2018; 13:e0206369. [PMID: 30376559 PMCID: PMC6207335 DOI: 10.1371/journal.pone.0206369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 10/11/2018] [Indexed: 01/10/2023] Open
Abstract
Proteins are chains of amino acids which adopt a three-dimensional structure and are then able to catalyze chemical reactions or propagate signals in organisms. Without external influence, many proteins fold into their native structure, and a small number of Early Folding Residues (EFR) have previously been shown to initiate the formation of secondary structure elements and guide their respective assembly. Using the two diverse superfamilies of aminoacyl-tRNA synthetases (aaRS), it is shown that the position of EFR is preserved over the course of evolution even when the corresponding sequence conservation is small. Folding initiation sites are positioned in the center of secondary structure elements, independent of aaRS class. In class I, the predicted position of EFR resembles an ancient structural packing motif present in many seemingly unrelated proteins. Furthermore, it is shown that EFR and functionally relevant residues in aaRS are almost entirely disjoint sets of residues. The Start2Fold database is used to investigate whether this separation of EFR and functional residues can be observed for other proteins. EFR are found to constitute crucial connectors of protein regions which are distant at sequence level. Especially, these residues exhibit a high number of non-covalent residue-residue contacts such as hydrogen bonds and hydrophobic interactions. This tendency also manifests as energetically stable local regions, as substantiated by a knowledge-based potential. Despite profound differences regarding how EFR and functional residues are embedded in protein structures, a strict separation of structurally and functionally relevant residues cannot be observed for a more general collection of proteins.
Collapse
Affiliation(s)
- Sebastian Bittrich
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Dirk Labudde
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
| |
Collapse
|
90
|
Sun J, Yang LL, Chen X, Kong DX, Liu R. Integrating Multifaceted Information to Predict Mycobacterium tuberculosis-Human Protein-Protein Interactions. J Proteome Res 2018; 17:3810-3823. [PMID: 30269499 DOI: 10.1021/acs.jproteome.8b00497] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Tuberculosis (TB) is one of the biggest infectious disease killers caused by Mycobacterium tuberculosis (MTB). Studying the protein-protein interactions (PPIs) between MTB and human can deepen our understanding of the pathogenesis of TB and offer new clues to the treatment against MTB infection, but the experimentally validated interactions are especially scarce in this regard. Herein we proposed an integrated framework that combined template-, domain-domain interaction-, and machine learning-based methods to predict MTB-human PPIs. As a result, we established a network composed of 13 758 PPIs including 451 MTB proteins and 3167 human proteins ( http://liulab.hzau.edu.cn/MTB/ ). Compared to known human targets of various pathogens, our predicted human targets show a similar tendency in terms of the network topological properties and enrichment in important functional genes. Additionally, these human targets largely have longer sequence lengths, more protein domains, more disordered residues, lower evolutionary rates, and older protein ages. Functional analysis demonstrates that these proteins show strong preferences toward the phosphorylation, kinase activity, and signaling transduction processes and the disease and immune related pathways. Dissecting the cross-talk among top-ranked pathways suggests that the cancer pathway may serve as a bridge in MTB infection. Triplet analysis illustrates that the paired targets interacting with the same partner are adjacent to each other in the intraspecies network and tend to share similar expression patterns. Finally, we identified 36 potential anti-MTB human targets by integrating known drug target information and molecular properties of proteins.
Collapse
|
91
|
Hazarika RR, Sostaric N, Sun Y, van Noort V. Large-scale docking predicts that sORF-encoded peptides may function through protein-peptide interactions in Arabidopsis thaliana. PLoS One 2018; 13:e0205179. [PMID: 30321192 PMCID: PMC6188750 DOI: 10.1371/journal.pone.0205179] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 09/20/2018] [Indexed: 02/07/2023] Open
Abstract
Several recent studies indicate that small Open Reading Frames (sORFs) embedded within multiple eukaryotic non-coding RNAs can be translated into bioactive peptides of up to 100 amino acids in size. However, the functional roles of the 607 Stress Induced Peptides (SIPs) previously identified from 189 Transcriptionally Active Regions (TARs) in Arabidopsis thaliana remain unclear. To provide a starting point for functional annotation of these plant-derived peptides, we performed a large-scale prediction of peptide binding sites on protein surfaces using coarse-grained peptide docking. The docked models were subjected to further atomistic refinement and binding energy calculations. A total of 530 peptide-protein pairs were successfully docked. In cases where a peptide encoded by a TAR is predicted to bind at a known ligand or cofactor-binding site within the protein, it can be assumed that the peptide modulates the ligand or cofactor-binding. Moreover, we predict that several peptides bind at protein-protein interfaces, which could therefore regulate the formation of the respective complexes. Protein-peptide binding analysis further revealed that peptides employ both their backbone and side chain atoms when binding to the protein, forming predominantly hydrophobic interactions and hydrogen bonds. In this study, we have generated novel predictions on the potential protein-peptide interactions in A. thaliana, which will help in further experimental validation.
Collapse
Affiliation(s)
- Rashmi R. Hazarika
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
| | - Nikolina Sostaric
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
| | - Yifeng Sun
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
- Faculty of Engineering Technology, Campus Group T, KU Leuven, Leuven, Belgium
| | - Vera van Noort
- Department of Microbial and Molecular Systems, KU Leuven, Leuven, Belgium
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- * E-mail:
| |
Collapse
|
92
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
93
|
Bordin N, González-Sánchez JC, Devos DP. PVCbase: an integrated web resource for the PVC bacterial proteomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4985508. [PMID: 29718141 PMCID: PMC5915940 DOI: 10.1093/database/bay042] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Accepted: 04/05/2018] [Indexed: 11/13/2022]
Abstract
Interest in the Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) bacterial superphylum is growing within the microbiology community. These organisms do not have a specialized web resource that gathers in silico predictions in an integrated fashion. Hence, we are providing the PVC community with PVCbase, a specialized web resource that gathers in silico predictions in an integrated fashion. PVCbase integrates protein function annotations obtained through sequence analysis and tertiary structure prediction for 39 representative PVC proteomes (PVCdb), a protein feature visualizer (Foundation) and a custom BLAST webserver (PVCBlast) that allows to retrieve the annotation of a hit directly from the DataTables. We display results from various predictors, encompassing most functional aspects, allowing users to have a more comprehensive overview of protein identities. Additionally, we illustrate how the application of PVCdb can be used to address biological questions from raw data. Database URL: PVCbase is freely accessible at www.pvcbacteria.org/pvcbase
Collapse
Affiliation(s)
- Nicola Bordin
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera, Km. 1, Seville 41013, Spain
| | - Juan Carlos González-Sánchez
- CellNetworks, BioQuant, University of Heidelberg, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg (BZH), Heidelberg University, Im Neuenheimer Feld 328, 69120 Heidelberg, Germany
| | - Damien P Devos
- Centro Andaluz de Biología del Desarrollo, CSIC, Universidad Pablo de Olavide, Carretera de Utrera, Km. 1, Seville 41013, Spain
| |
Collapse
|
94
|
How is structural divergence related to evolutionary information? Mol Phylogenet Evol 2018; 127:859-866. [DOI: 10.1016/j.ympev.2018.06.033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 06/01/2018] [Accepted: 06/19/2018] [Indexed: 12/15/2022]
|
95
|
Jubb HC, Saini HK, Verdonk ML, Forbes SA. COSMIC-3D provides structural perspectives on cancer genetics for drug discovery. Nat Genet 2018; 50:1200-1202. [PMID: 30158682 PMCID: PMC6159874 DOI: 10.1038/s41588-018-0214-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
COSMIC-3D is a comprehensive integration of cancer mutations with protein structure across the human genome and structural proteome, seeking to support the identification and characterisation of protein targets for novel drug design in precision oncology. As an interactive system to explore cancer mutations in three-dimensions, COSMIC-3D is designed to enable a greater understanding of the functional impact of mutations, generate new hypotheses on which mutations are cancer drivers, and provide new opportunities for addressing these mutations pharmaceutically. This combination of genetics, structural proteomics, and drug development, can be best described as “mutation-guided drug design”.
Collapse
Affiliation(s)
- Harry C Jubb
- COSMIC, Wellcome Sanger Institute, Cambridge, UK.
- Astex Pharmaceuticals, Cambridge, UK.
| | | | | | | |
Collapse
|
96
|
Albanese SK, Parton DL, Işık M, Rodríguez-Laureano L, Hanson SM, Behr JM, Gradia S, Jeans C, Levinson NM, Seeliger MA, Chodera JD. An Open Library of Human Kinase Domain Constructs for Automated Bacterial Expression. Biochemistry 2018; 57:4675-4689. [PMID: 30004690 PMCID: PMC6081246 DOI: 10.1021/acs.biochem.7b01081] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Kinases play a critical role in cellular signaling and are dysregulated in a number of diseases, such as cancer, diabetes, and neurodegeneration. Therapeutics targeting kinases currently account for roughly 50% of cancer drug discovery efforts. The ability to explore human kinase biochemistry and biophysics in the laboratory is essential to designing selective inhibitors and studying drug resistance. Bacterial expression systems are superior to insect or mammalian cells in terms of simplicity and cost effectiveness but have historically struggled with human kinase expression. Following the discovery that phosphatase coexpression produced high yields of Src and Abl kinase domains in bacteria, we have generated a library of 52 His-tagged human kinase domain constructs that express above 2 μg/mL of culture in an automated bacterial expression system utilizing phosphatase coexpression (YopH for Tyr kinases and lambda for Ser/Thr kinases). Here, we report a structural bioinformatics approach to identifying kinase domain constructs previously expressed in bacteria and likely to express well in our protocol, experiments demonstrating our simple construct selection strategy selects constructs with good expression yields in a test of 84 potential kinase domain boundaries for Abl, and yields from a high-throughput expression screen of 96 human kinase constructs. Using a fluorescence-based thermostability assay and a fluorescent ATP-competitive inhibitor, we show that the highest-expressing kinases are folded and have well-formed ATP binding sites. We also demonstrate that these constructs can enable characterization of clinical mutations by expressing a panel of 48 Src and 46 Abl mutations. The wild-type kinase construct library is available publicly via Addgene.
Collapse
Affiliation(s)
- Steven K Albanese
- Louis V. Gerstner, Jr Graduate School of Biomedical Sciences , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| | - Daniel L Parton
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| | - Mehtap Işık
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
- Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate School of Medical Sciences , Cornell University , New York , New York 10065 , United States
| | - Lucelenie Rodríguez-Laureano
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| | - Sonya M Hanson
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| | - Julie M Behr
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Graduate School of Medical Sciences , Cornell University , New York , New York 10065 , United States
| | - Scott Gradia
- QB3MacroLab , University of California , Berkeley , California 94720 , United States
| | - Chris Jeans
- QB3MacroLab , University of California , Berkeley , California 94720 , United States
| | - Nicholas M Levinson
- Department of Pharmacology , University of Minnesota , Minneapolis , Minnesota 55455 , United States
| | - Markus A Seeliger
- Department of Pharmacological Sciences , Stony Brook University Medical School , Stony Brook , New York 11794 , United States
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute , Memorial Sloan Kettering Cancer Center , New York , New York 10065 , United States
| |
Collapse
|
97
|
Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018; 14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Collapse
Affiliation(s)
- Meenakshi S Iyer
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, Karnataka 560 065, India.
| | | | | |
Collapse
|
98
|
Tian W, Chen C, Lei X, Zhao J, Liang J. CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res 2018; 46:W363-W367. [PMID: 29860391 PMCID: PMC6031066 DOI: 10.1093/nar/gky473] [Citation(s) in RCA: 1122] [Impact Index Per Article: 187.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 05/04/2018] [Accepted: 05/17/2018] [Indexed: 12/23/2022] Open
Abstract
Geometric and topological properties of protein structures, including surface pockets, interior cavities and cross channels, are of fundamental importance for proteins to carry out their functions. Computed Atlas of Surface Topography of proteins (CASTp) is a web server that provides online services for locating, delineating and measuring these geometric and topological properties of protein structures. It has been widely used since its inception in 2003. In this article, we present the latest version of the web server, CASTp 3.0. CASTp 3.0 continues to provide reliable and comprehensive identifications and quantifications of protein topography. In addition, it now provides: (i) imprints of the negative volumes of pockets, cavities and channels, (ii) topographic features of biological assemblies in the Protein Data Bank, (iii) improved visualization of protein structures and pockets, and (iv) more intuitive structural and annotated information, including information of secondary structure, functional sites, variant sites and other annotations of protein residues. The CASTp 3.0 web server is freely accessible at http://sts.bioe.uic.edu/castp/.
Collapse
Affiliation(s)
- Wei Tian
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Chang Chen
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Xue Lei
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Jieling Zhao
- Institut National de Recherche en Informatique et en Automatique, Paris 75012, France
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
99
|
Kleywegt GJ, Velankar S, Patwardhan A. Structural biology data archiving - where we are and what lies ahead. FEBS Lett 2018; 592:2153-2167. [PMID: 29749603 PMCID: PMC6019198 DOI: 10.1002/1873-3468.13086] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 04/25/2018] [Accepted: 04/30/2018] [Indexed: 12/31/2022]
Abstract
For almost 50 years, structural biology has endeavoured to conserve and share its experimental data and their interpretations (usually, atomistic models) through global public archives such as the Protein Data Bank, Electron Microscopy Data Bank and Biological Magnetic Resonance Data Bank (BMRB). These archives are treasure troves of freely accessible data that document our quest for molecular or atomic understanding of biological function and processes in health and disease. They have prepared the field to tackle new archiving challenges as more and more (combinations of) techniques are being utilized to elucidate structure at ever increasing length scales. Furthermore, the field has made substantial efforts to develop validation methods that help users to assess the reliability of structures and to identify the most appropriate data for their needs. In this Review, we present an overview of public data archives in structural biology and discuss the importance of validation for users and producers of structural data. Finally, we sketch our efforts to integrate structural data with bioimaging data and with other sources of biological data. This will make relevant structural information available and more easily discoverable for a wide range of scientists.
Collapse
Affiliation(s)
- Gerard J. Kleywegt
- European Molecular Biology Laboratory (EMBL)European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | - Sameer Velankar
- European Molecular Biology Laboratory (EMBL)European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | - Ardan Patwardhan
- European Molecular Biology Laboratory (EMBL)European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| |
Collapse
|
100
|
Kumar G, Mudgal R, Srinivasan N, Sandhya S. Use of designed sequences in protein structure recognition. Biol Direct 2018; 13:8. [PMID: 29776380 PMCID: PMC5960202 DOI: 10.1186/s13062-018-0209-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 04/18/2018] [Indexed: 12/13/2022] Open
Abstract
Background Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. Results We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. Conclusion The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as ‘linkers’, where natural linkers between distant proteins are unavailable. Reviewers This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian. Electronic supplementary material The online version of this article (10.1186/s13062-018-0209-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gayatri Kumar
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Richa Mudgal
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.,Present address: Institute for Research in Biomedicine (IRB), Parc Cientific de Barcelona, C/ Baldiri Reixac 10, 08028, Barcelona, Spain
| | - Narayanaswamy Srinivasan
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| | - Sankaran Sandhya
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| |
Collapse
|