51
|
Shamilov R, Robinson VL, Aneskievich BJ. Seeing Keratinocyte Proteins through the Looking Glass of Intrinsic Disorder. Int J Mol Sci 2021; 22:ijms22157912. [PMID: 34360678 PMCID: PMC8348711 DOI: 10.3390/ijms22157912] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 06/28/2021] [Accepted: 07/20/2021] [Indexed: 02/06/2023] Open
Abstract
Epidermal keratinocyte proteins include many with an eccentric amino acid content (compositional bias), atypical ultrastructural fate (built-in protease sensitivity), or assembly visible at the light microscope level (cytoplasmic granules). However, when considered through the looking glass of intrinsic disorder (ID), these apparent oddities seem quite expected. Keratinocyte proteins with highly repetitive motifs are of low complexity but high adaptation, providing polymers (e.g., profilaggrin) for proteolysis into bioactive derivatives, or monomers (e.g., loricrin) repeatedly cross-linked to self and other proteins to shield underlying tissue. Keratohyalin granules developing from liquid–liquid phase separation (LLPS) show that unique biomolecular condensates (BMC) and proteinaceous membraneless organelles (PMLO) occur in these highly customized cells. We conducted bioinformatic and in silico assessments of representative keratinocyte differentiation-dependent proteins. This was conducted in the context of them having demonstrated potential ID with the prospect of that characteristic driving formation of distinctive keratinocyte structures. Intriguingly, while ID is characteristic of many of these proteins, it does not appear to guarantee LLPS, nor is it required for incorporation into certain keratinocyte protein condensates. Further examination of keratinocyte-specific proteins will provide variations in the theme of PMLO, possibly recognizing new BMC for advancements in understanding intrinsically disordered proteins as reflected by keratinocyte biology.
Collapse
Affiliation(s)
- Rambon Shamilov
- Graduate Program in Pharmacology & Toxicology, Department of Pharmaceutical Sciences, University of Connecticut, 69 North Eagleville Road, Storrs, CT 06269, USA;
| | - Victoria L. Robinson
- Department of Molecular and Cellular Biology, College of Liberal Arts & Sciences, University of Connecticut, 91 North Eagleville Road, Storrs, CT 06269, USA;
| | - Brian J. Aneskievich
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Connecticut, Storrs, CT 06269, USA
- Correspondence: ; Tel.: +1-860-486-3053
| |
Collapse
|
52
|
Hu G, Katuwawala A, Wang K, Wu Z, Ghadermarzi S, Gao J, Kurgan L. flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions. Nat Commun 2021; 12:4438. [PMID: 34290238 PMCID: PMC8295265 DOI: 10.1038/s41467-021-24773-7] [Citation(s) in RCA: 140] [Impact Index Per Article: 46.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/06/2021] [Indexed: 01/05/2023] Open
Abstract
Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn's webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/.
Collapse
Affiliation(s)
- Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
53
|
Mulukala SKN, Kambhampati V, Qadri AH, Pasupulati AK. Evolutionary conservation of intrinsically unstructured regions in slit-diaphragm proteins. PLoS One 2021; 16:e0254917. [PMID: 34288970 PMCID: PMC8294545 DOI: 10.1371/journal.pone.0254917] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Accepted: 07/06/2021] [Indexed: 01/19/2023] Open
Abstract
Vertebrate kidneys contribute to homeostasis by regulating electrolyte, acid-base balance, removing toxic metabolites from blood, and preventing protein loss into the urine. Glomerular podocytes constitute the blood-urine barrier, and podocyte slit-diaphragm (SD), a modified tight junction, contributes to the glomerular permselectivity. Nephrin, KIRREL1, podocin, CD2AP, and TRPC6 are crucial members of the SD that interact with each other and contribute to the SD's structural and functional integrity. This study analyzed the distribution of these five essential SD proteins across the organisms for which the genome sequence is available. We found a diverse distribution of nephrin and KIRREL1 ranging from nematodes to higher vertebrates, whereas podocin, CD2AP, and TRPC6 are restricted to the vertebrates. Among invertebrates, nephrin and its orthologs consist of more immunoglobulin-3 domains, whereas in the vertebrates, CD80-like C2-set domains are predominant. In the case of KIRREL1 and its orthologs, more Ig domains were observed in invertebrates than vertebrates. Src Homology-3 (SH3) domain of CD2AP and SPFH domain of podocin are highly conserved among vertebrates. TRPC6 and its orthologs had conserved ankyrin repeats, TRP, and ion transport domains, except Chondrichthyes and Echinodermata, which do not possess the ankyrin repeats. Intrinsically unstructured regions (IURs) are conserved across the SD orthologs, suggesting IURs importance in the protein complexes that constitute the slit-diaphragm. For the first time, a study reports the evolutionary insights of vertebrate SD proteins and their invertebrate orthologs.
Collapse
Affiliation(s)
- Sandeep K N Mulukala
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, India
| | - Vaishnavi Kambhampati
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, India
| | - Abrar H Qadri
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, India
| | - Anil K Pasupulati
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, India
| |
Collapse
|
54
|
Quaglia F, Lazar T, Hatos A, Tompa P, Piovesan D, Tosatto SCE. Exploring Curated Conformational Ensembles of Intrinsically Disordered Proteins in the Protein Ensemble Database. Curr Protoc 2021; 1:e192. [PMID: 34252246 DOI: 10.1002/cpz1.192] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The Protein Ensemble Database (PED; https://proteinensemble.org/) is the major repository of conformational ensembles of intrinsically disordered proteins (IDPs). Conformational ensembles of IDPs are primarily provided by their authors or occasionally collected from literature, and are subsequently deposited in PED along with the corresponding structured, manually curated metadata. The modeling of conformational ensembles usually relies on experimental data from small-angle X-ray scattering (SAXS), fluorescence resonance energy transfer (FRET), NMR spectroscopy, and molecular dynamics (MD) simulations, or a combination of these techniques. The growing number of scientific studies based on these data, along with the astounding and swift progress in the field of protein intrinsic disorder, has required a significant update and upgrade of PED, first published in 2014. To this end, the database was entirely renewed in 2020 and now has a dedicated team of biocurators providing manually curated descriptions of the methods and conditions applied to generate the conformational ensembles and for checking consistency of the data. Here, we present a detailed description on how to explore PED with its protein pages and experimental pages, and how to interpret entries of conformational ensembles. We describe how to efficiently search conformational ensembles deposited in PED by means of its web interface and API. We demonstrate how to make sense of the PED protein page and its associated experimental entry pages with reference to the yeast Sic1 use case. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Performing a search in PED Support Protocol 1: Programmatic access with the PED API Basic Protocol 2: Interpreting the protein page and the experimental entry page-the Sic1 use case Support Protocol 2: Downloading options Support Protocol 3: Understanding the validation report-the Sic1 use case Basic Protocol 3: Submitting new conformational ensembles to PED Basic Protocol 4: Providing feedback in PED.
Collapse
Affiliation(s)
- Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Tamas Lazar
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Peter Tompa
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.,VIB-VUB Center for Structural Biology, Brussels, Belgium.,Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | |
Collapse
|
55
|
Affiliation(s)
- Benjamin Lang
- Department of Structural Biology and the Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| | - M Madan Babu
- Department of Structural Biology and the Center for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
56
|
Fang S, Liu S, Shen J, Lu AZ, Wang AKY, Zhang Y, Li K, Liu J, Yang L, Hu CD, Wan J. Updated SARS-CoV-2 single nucleotide variants and mortality association. J Med Virol 2021; 93:6525-6534. [PMID: 34245452 PMCID: PMC8426680 DOI: 10.1002/jmv.27191] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 07/05/2021] [Accepted: 07/07/2021] [Indexed: 12/29/2022]
Abstract
By analyzing newly collected SARS‐CoV‐2 genomes and comparing them with our previous study about SARS‐CoV‐2 single nucleotide variants (SNVs) before June 2020, we found that the SNV clustering had changed remarkably since June 2020. Apart from that the group of SNVs became dominant, which is represented by two nonsynonymous mutations A23403G (S:D614G) and C14408T (ORF1ab:P4715L), a few emerging groups of SNVs were recognized with sharply increased monthly incidence ratios of up to 70% in November 2020. Further investigation revealed sets of SNVs specific to patients' ages and/or gender, or strongly associated with mortality. Our logistic regression model explored features contributing to mortality status, including three critical SNVs, G25088T(S:V1176F), T27484C (ORF7a:L31L), and T25A (upstream of ORF1ab), ages above 40 years old, and the male gender. The protein structure analysis indicated that the emerging subgroups of nonsynonymous SNVs and the mortality‐related ones were located on the protein surface area. The clashes in protein structure introduced by these mutations might in turn affect the viral pathogenesis through the alteration of protein conformation, leading to a difference in transmission and virulence. Particularly, we explored the fact that nonsynonymous SNVs tended to occur in intrinsic disordered regions of Spike and ORF1ab to significantly increase hydrophobicity, suggesting a potential role in the change of protein folding related to immune evasion. There has been a considerable temporal change of the SARS‐CoV‐2 single nucleotide variants (SNVs) clustering since June 2020. Apart from one group of SNVs that became dominant, a few emerging groups of SNVs were recognized with sharply increased monthly occurrence ratios in November 2020. All of these individual SNVs could be traced back to February or March of 2020 when they were identified for the first time, suggesting a potential incubation period of the collectivity of special groups of SNVs. 114 age‐specific SNVs were identified in one or across multiple age groups. 42 SNVs showed significantly high rates in either males or females. 41 and 30 SNVs were observed with at least twofold higher incidence rates in the death and the nondeath group, respectively. A logistic regression model demonstrated that three critical SNVs, G25088T(S:V1176F), T27484C (ORF7a:L31L), and T25A (upstream of ORF1ab), ages above 40 years old, and the male group contribute to a relatively higher mortality. The emerging subgroups of nonsynonymous SNVs and the mortality‐related ones were located on the protein surface area. Nonsynonymous SNVs tended to occur in intrinsically disordered regions of Spike and ORF1ab.
Collapse
Affiliation(s)
- Shuyi Fang
- Department of BioHealth Informatics, Indiana University School of Informatics and Computing, Indiana University - Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Sheng Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, USA.,Collaborative Core for Cancer Bioinformatics (C3B) shared by Indiana University Simon Comprehensive Cancer Center and Purdue University Center for Cancer Research, Indianapolis, Indiana, USA
| | - Jikui Shen
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Alex Z Lu
- Park Tudor School, Indianapolis, Indiana, USA
| | | | - Yucheng Zhang
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, USA.,Collaborative Core for Cancer Bioinformatics (C3B) shared by Indiana University Simon Comprehensive Cancer Center and Purdue University Center for Cancer Research, Indianapolis, Indiana, USA
| | - Kailing Li
- Department of BioHealth Informatics, Indiana University School of Informatics and Computing, Indiana University - Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Juli Liu
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Lei Yang
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Chang-Deng Hu
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana, USA.,Purdue University Center for Cancer Research, Purdue University, West Lafayette, Indiana, USA
| | - Jun Wan
- Department of BioHealth Informatics, Indiana University School of Informatics and Computing, Indiana University - Purdue University Indianapolis, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, USA.,Collaborative Core for Cancer Bioinformatics (C3B) shared by Indiana University Simon Comprehensive Cancer Center and Purdue University Center for Cancer Research, Indianapolis, Indiana, USA.,The Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| |
Collapse
|
57
|
Erdős G, Pajkos M, Dosztányi Z. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res 2021; 49:W297-W303. [PMID: 34048569 PMCID: PMC8262696 DOI: 10.1093/nar/gkab408] [Citation(s) in RCA: 248] [Impact Index Per Article: 82.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/21/2021] [Accepted: 05/14/2021] [Indexed: 12/22/2022] Open
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) exist without a single well-defined conformation. They carry out important biological functions with multifaceted roles which is also reflected in their evolutionary behavior. Computational methods play important roles in the characterization of IDRs. One of the commonly used disorder prediction methods is IUPred, which relies on an energy estimation approach. The IUPred web server takes an amino acid sequence or a Uniprot ID/accession as an input and predicts the tendency for each amino acid to be in a disordered region with an option to also predict context-dependent disordered regions. In this new iteration of IUPred, we added multiple novel features to enhance the prediction capabilities of the server. First, learning from the latest evaluation of disorder prediction methods we introduced multiple new smoothing functions to the prediction that decreases noise and increases the performance of the predictions. We constructed a dataset consisting of experimentally verified ordered/disordered regions with unambiguous annotations which were added to the prediction. We also introduced a novel tool that enables the exploration of the evolutionary conservation of protein disorder coupled to sequence conservation in model organisms. The web server is freely available to users and accessible at https://iupred3.elte.hu.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Mátyás Pajkos
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| |
Collapse
|
58
|
Clerc I, Sagar A, Barducci A, Sibille N, Bernadó P, Cortés J. The diversity of molecular interactions involving intrinsically disordered proteins: A molecular modeling perspective. Comput Struct Biotechnol J 2021; 19:3817-3828. [PMID: 34285781 PMCID: PMC8273358 DOI: 10.1016/j.csbj.2021.06.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 01/15/2023] Open
Abstract
Intrinsically Disordered Proteins and Regions (IDPs/IDRs) are key components of a multitude of biological processes. Conformational malleability enables IDPs/IDRs to perform very specialized functions that cannot be accomplished by globular proteins. The functional role for most of these proteins is related to the recognition of other biomolecules to regulate biological processes or as a part of signaling pathways. Depending on the extent of disorder, the number of interacting sites and the type of partner, very different architectures for the resulting assemblies are possible. More recently, molecular condensates with liquid-like properties composed of multiple copies of IDPs and nucleic acids have been proven to regulate key processes in eukaryotic cells. The structural and kinetic details of disordered biomolecular complexes are difficult to unveil experimentally due to their inherent conformational heterogeneity. Computational approaches, alone or in combination with experimental data, have emerged as unavoidable tools to understand the functional mechanisms of this elusive type of assemblies. The level of description used, all-atom or coarse-grained, strongly depends on the size of the molecular systems and on the timescale of the investigated mechanism. In this mini-review, we describe the most relevant architectures found for molecular interactions involving IDPs/IDRs and the computational strategies applied for their investigation.
Collapse
Affiliation(s)
- Ilinka Clerc
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Amin Sagar
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Alessandro Barducci
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Nathalie Sibille
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Pau Bernadó
- Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
59
|
Dallago C, Schütze K, Heinzinger M, Olenyi T, Littmann M, Lu AX, Yang KK, Min S, Yoon S, Morton JT, Rost B. Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets. Curr Protoc 2021; 1:e113. [PMID: 33961736 DOI: 10.1002/cpz1.113] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.
Collapse
Affiliation(s)
- Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany
| | - Tobias Olenyi
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany
| | - Amy X Lu
- Department of Computer Science, University of Toronto, Toronto, Canada & Vector Institute
| | - Kevin K Yang
- Microsoft Research New England, Cambridge, Massachusetts
| | - Seonwoo Min
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - James T Morton
- Center for Computational Biology, Flatiron Institute, New York, New York
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,Institute for Advanced Study (TUM-IAS), Garching/Munich, Germany.,TUM School of Life Sciences Weihenstephan (WZW), Freising, Germany.,Columbia University, Department of Biochemistry and Molecular Biophysics, New York, New York.,New York Consortium on Membrane Protein Structure (NYCOMPS), New York, New York
| |
Collapse
|
60
|
Song B, Li Z, Lin X, Wang J, Wang T, Fu X. Pretraining model for biological sequence data. Brief Funct Genomics 2021; 20:181-195. [PMID: 34050350 PMCID: PMC8194843 DOI: 10.1093/bfgp/elab025] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 04/13/2021] [Accepted: 04/21/2021] [Indexed: 12/26/2022] Open
Abstract
With the development of high-throughput sequencing technology, biological sequence data reflecting life information becomes increasingly accessible. Particularly on the background of the COVID-19 pandemic, biological sequence data play an important role in detecting diseases, analyzing the mechanism and discovering specific drugs. In recent years, pretraining models that have emerged in natural language processing have attracted widespread attention in many research fields not only to decrease training cost but also to improve performance on downstream tasks. Pretraining models are used for embedding biological sequence and extracting feature from large biological sequence corpus to comprehensively understand the biological sequence data. In this survey, we provide a broad review on pretraining models for biological sequence data. Moreover, we first introduce biological sequences and corresponding datasets, including brief description and accessible link. Subsequently, we systematically summarize popular pretraining models for biological sequences based on four categories: CNN, word2vec, LSTM and Transformer. Then, we present some applications with proposed pretraining models on downstream tasks to explain the role of pretraining models. Next, we provide a novel pretraining scheme for protein sequences and a multitask benchmark for protein pretraining models. Finally, we discuss the challenges and future directions in pretraining models for biological sequences.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiangzheng Fu
- Corresponding author: Xiangzheng Fu, College of Information Science and Engineering, Hunan University, Changsha, Hunan, China. Tel: 86-0731-88821907; E-mail:
| |
Collapse
|
61
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
62
|
Variants in GCNA, X-linked germ-cell genome integrity gene, identified in men with primary spermatogenic failure. Hum Genet 2021; 140:1169-1182. [PMID: 33963445 DOI: 10.1007/s00439-021-02287-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 04/23/2021] [Indexed: 01/25/2023]
Abstract
Male infertility impacts millions of couples yet, the etiology of primary infertility remains largely unknown. A critical element of successful spermatogenesis is maintenance of genome integrity. Here, we present a genomic study of spermatogenic failure (SPGF). Our initial analysis (n = 176) did not reveal known gene-candidates but identified a potentially significant single-nucleotide variant (SNV) in X-linked germ-cell nuclear antigen (GCNA). Together with a larger follow-up study (n = 2049), 7 likely clinically relevant GCNA variants were identified. GCNA is critical for genome integrity in male meiosis and knockout models exhibit impaired spermatogenesis and infertility. Single-cell RNA-seq and immunohistochemistry confirm human GCNA expression from spermatogonia to elongated spermatids. Five identified SNVs were located in key functional regions, including N-terminal SUMO-interacting motif and C-terminal Spartan-like protease domain. Notably, variant p.Ala115ProfsTer7 results in an early frameshift, while Spartan-like domain missense variants p.Ser659Trp and p.Arg664Cys change conserved residues, likely affecting 3D structure. For variants within GCNA's intrinsically disordered region, we performed computational modeling for consensus motifs. Two SNVs were predicted to impact the structure of these consensus motifs. All identified variants have an extremely low minor allele frequency in the general population and 6 of 7 were not detected in > 5000 biological fathers. Considering evidence from animal models, germ-cell-specific expression, 3D modeling, and computational predictions for SNVs, we propose that identified GCNA variants disrupt structure and function of the respective protein domains, ultimately arresting germ-cell division. To our knowledge, this is the first study implicating GCNA, a key genome integrity factor, in human male infertility.
Collapse
|
63
|
Dyrka W, Gąsior-Głogowska M, Szefczyk M, Szulc N. Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars. BMC Bioinformatics 2021; 22:222. [PMID: 33926372 PMCID: PMC8086366 DOI: 10.1186/s12859-021-04139-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 04/19/2021] [Indexed: 11/16/2022] Open
Abstract
Background Amyloid signaling motifs are a class of protein motifs which share basic structural and functional features despite the lack of clear sequence homology. They are hard to detect in large sequence databases either with the alignment-based profile methods (due to short length and diversity) or with generic amyloid- and prion-finding tools (due to insufficient discriminative power). We propose to address the challenge with a machine learning grammatical model capable of generalizing over diverse collections of unaligned yet related motifs. Results First, we introduce and test improvements to our probabilistic context-free grammar framework for protein sequences that allow for inferring more sophisticated models achieving high sensitivity at low false positive rates. Then, we infer universal grammars for a collection of recently identified bacterial amyloid signaling motifs and demonstrate that the method is capable of generalizing by successfully searching for related motifs in fungi. The results are compared to available alternative methods. Finally, we conduct spectroscopy and staining analyses of selected peptides to verify their structural and functional relationship. Conclusions While the profile HMMs remain the method of choice for modeling homologous sets of sequences, PCFGs seem more suitable for building meta-family descriptors and extrapolating beyond the seed sample. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04139-y.
Collapse
Affiliation(s)
- Witold Dyrka
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland.
| | - Marlena Gąsior-Głogowska
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| | - Monika Szefczyk
- Wydział Chemiczny, Katedra Chemii Bioorganicznej, Politechnika Wrocławska, Wrocław, Poland
| | - Natalia Szulc
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| |
Collapse
|
64
|
Katuwawala A, Ghadermarzi S, Hu G, Wu Z, Kurgan L. QUARTERplus: Accurate disorder predictions integrated with interpretable residue-level quality assessment scores. Comput Struct Biotechnol J 2021; 19:2597-2606. [PMID: 34025946 PMCID: PMC8122155 DOI: 10.1016/j.csbj.2021.04.066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/24/2021] [Accepted: 04/24/2021] [Indexed: 12/13/2022] Open
Abstract
A recent advance in the disorder prediction field is the development of the quality assessment (QA) scores. QA scores complement the propensities produced by the disorder predictors by identifying regions where these predictions are more likely to be correct. We develop, empirically test and release a new QA tool, QUARTERplus, that addresses several key drawbacks of the current QA method, QUARTER. QUARTERplus is the first solution that utilizes QA scores and the associated input disorder predictions to produce very accurate disorder predictions with the help of a modern deep learning meta-model. The deep neural network utilizes the QA scores to identify and fix the regions where the original/input disorder predictions are poor. More importantly, the accurate QUATERplus's predictions are accompanied by easy to interpret residue-level QA scores that reliably quantify their residue-level predictive quality. We provide these interpretable QA scores for QUARTERplus and 10 other popular disorder predictors. Empirical tests on a large and independent (low similarity) test dataset show that QUARTERplus predictions secure AUC = 0.93 and are statistically more accurate than the results of twelve state-of-the-art disorder predictors. We also demonstrate that the new QA scores produced by QUARTERplus are highly correlated with the actual predictive quality and that they can be effectively used to identify regions of correct disorder predictions. This feature empowers the users to easily identify which parts of the predictions generated by the modern disorder predictors are more trustworthy. QUARTERplus is available as a convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTERplus/.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Zhonghua Wu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
65
|
Hatos A, Quaglia F, Piovesan D, Tosatto SCE. APICURON: a database to credit and acknowledge the work of biocurators. Database (Oxford) 2021; 2021:baab019. [PMID: 33882120 PMCID: PMC8060004 DOI: 10.1093/database/baab019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/12/2021] [Accepted: 04/12/2021] [Indexed: 11/14/2022]
Abstract
APICURON is an open and freely accessible resource that tracks and credits the work of biocurators across multiple participating knowledgebases. Biocuration is essential to extract knowledge from research data and make it available in a structured and standardized way to the scientific community. However, processing biological data-mainly from literature-requires a huge effort that is difficult to attribute and quantify. APICURON collects biocuration events from third-party resources and aggregates this information, spotlighting biocurator contributions. APICURON promotes biocurator engagement implementing gamification concepts like badges, medals and leaderboards and at the same time provides a monitoring service for registered resources and for biocurators themselves. APICURON adopts a data model that is flexible enough to represent and track the majority of biocuration activities. Biocurators are identified through their Open Researcher and Contributor ID. The definition of curation events, scoring systems and rules for assigning badges and medals are resource-specific and easily customizable. Registered resources can transfer curation activities on the fly through a secure and robust Application Programming Interface (API). Here, we show how simple and effective it is to connect a resource to APICURON, describing the DisProt database of intrinsically disordered proteins as a use case. We believe APICURON will provide biological knowledgebases with a service to recognize and credit the effort of their biocurators, monitor their activity and promote curator engagement. Database URL: https://apicuron.org.
Collapse
Affiliation(s)
- András Hatos
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padova 35131, Italy
| |
Collapse
|
66
|
Dannenhoffer-Lafage T, Best RB. A Data-Driven Hydrophobicity Scale for Predicting Liquid-Liquid Phase Separation of Proteins. J Phys Chem B 2021; 125:4046-4056. [PMID: 33876938 DOI: 10.1021/acs.jpcb.0c11479] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
An accurate model for macroscale disordered assemblies of biological macromolecules such as those formed in so-called membraneless organelles would greatly assist in studying their structure, function, and dynamics. Recent evidence has suggested that liquid-liquid phase separation (LLPS) underlies the formation of membraneless organelles. While the general mechanism of exchange of macromolecule/water for macromolecule/macromolecule interactions is known to be the driving force for LLPS, the specific interactions involved are not well understood. One way that protein-water and protein-protein interactions have been understood historically is via hydrophobicity scales. However, these scales are typically optimized for describing these relative interactions in certain cases, such as protein folding or insertion of proteins into membranes. To better describe the relative interactions of proteins that undergo LLPS, we have developed a new, data-driven hydrophobicity scale. To determine the new scale, we used coarse-grained molecular dynamics simulations using the hydrophobicity scale coarse-grained model, which relates the interactions between amino acids to their hydrophobicity. Hydrophobicity values were determined via the force-balance method on a library of proteins that includes unfolded, intrinsically disordered, and phase-separating proteins (PSP). The resulting hydrophobicity scale can better predict whether a given protein will undergo LLPS at physiological conditions by using coarse-grained molecular dynamics simulations than existing hydrophobicity scales. This new scale confirms the importance of π-π interactions between amino acids as important drivers of LLPS. This new hydrophobicity scale provides a convenient and compact description of protein-protein interactions for proteins that undergo LLPS and could be used to develop new models to describe interactions between PSP and other components, such as nucleic acids.
Collapse
Affiliation(s)
- Thomas Dannenhoffer-Lafage
- Laboratory of Chemical Physics, National Institute for Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, United States
| | - Robert B Best
- Laboratory of Chemical Physics, National Institute for Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, United States
| |
Collapse
|
67
|
Identification of Intrinsically Disordered Protein Regions Based on Deep Neural Network-VGG16. ALGORITHMS 2021. [DOI: 10.3390/a14040107] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The accurate of i identificationntrinsically disordered proteins or protein regions is of great importance, as they are involved in critical biological process and related to various human diseases. In this paper, we develop a deep neural network that is based on the well-known VGG16. Our deep neural network is then trained through using 1450 proteins from the dataset DIS1616 and the trained neural network is tested on the remaining 166 proteins. Our trained neural network is also tested on the blind test set R80 and MXD494 to further demonstrate the performance of our model. The MCC value of our trained deep neural network is 0.5132 on the test set DIS166, 0.5270 on the blind test set R80 and 0.4577 on the blind test set MXD494. All of these MCC values of our trained deep neural network exceed the corresponding values of existing prediction methods.
Collapse
|
68
|
Peng Z, Xing Q, Kurgan L. APOD: accurate sequence-based predictor of disordered flexible linkers. Bioinformatics 2021; 36:i754-i761. [PMID: 33381830 PMCID: PMC7773485 DOI: 10.1093/bioinformatics/btaa808] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Disordered flexible linkers (DFLs) are abundant and functionally important intrinsically disordered regions that connect protein domains and structural elements within domains and which facilitate disorder-based allosteric regulation. Although computational estimates suggest that thousands of proteins have DFLs, they were annotated experimentally in <200 proteins. This substantial annotation gap can be reduced with the help of accurate computational predictors. The sole predictor of DFLs, DFLpred, trade-off accuracy for shorter runtime by excluding relevant but computationally costly predictive inputs. Moreover, it relies on the local/window-based information while lacking to consider useful protein-level characteristics. Results We conceptualize, design and test APOD (Accurate Predictor Of DFLs), the first highly accurate predictor that utilizes both local- and protein-level inputs that quantify propensity for disorder, sequence composition, sequence conservation and selected putative structural properties. Consequently, APOD offers significantly more accurate predictions when compared with its faster predecessor, DFLpred, and several other alternative ways to predict DFLs. These improvements stem from the use of a more comprehensive set of inputs that cover the protein-level information and the application of a more sophisticated predictive model, a well-parametrized support vector machine. APOD achieves area under the curve = 0.82 (28% improvement over DFLpred) and Matthews correlation coefficient = 0.42 (180% increase over DFLpred) when tested on an independent/low-similarity test dataset. Consequently, APOD is a suitable choice for accurate and small-scale prediction of DFLs. Availability and implementation https://yanglab.nankai.edu.cn/APOD/.
Collapse
Affiliation(s)
- Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China.,School of Statistics and Data Science, Nankai University, Tianjin 300074, China
| | - Qian Xing
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
69
|
Zhao B, Katuwawala A, Uversky VN, Kurgan L. IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell. Cell Mol Life Sci 2021; 78:2371-2385. [PMID: 32997198 PMCID: PMC11071772 DOI: 10.1007/s00018-020-03654-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/09/2020] [Accepted: 09/22/2020] [Indexed: 12/11/2022]
Abstract
Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL, 33612, USA.
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia.
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| |
Collapse
|
70
|
Monzon AM, Bonato P, Necci M, Tosatto SCE, Piovesan D. FLIPPER: Predicting and Characterizing Linear Interacting Peptides in the Protein Data Bank. J Mol Biol 2021; 433:166900. [PMID: 33647288 DOI: 10.1016/j.jmb.2021.166900] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 02/22/2021] [Accepted: 02/22/2021] [Indexed: 12/31/2022]
Abstract
A large fraction of peptides or protein regions are disordered in isolation and fold upon binding. These regions, also called MoRFs, SLiMs or LIPs, are often associated with signaling and regulation processes. However, despite their importance, only a limited number of examples are available in public databases and their automatic detection at the proteome level is problematic. Here we present FLIPPER, an automatic method for the detection of structurally linear sub-regions or peptides that interact with another chain in a protein complex. FLIPPER is a random forest classification that takes the protein structure as input and provides the propensity of each amino acid to be part of a LIP region. Models are built taking into consideration structural features such as intra- and inter-chain contacts, secondary structure, solvent accessibility in both bound and unbound state, structural linearity and chain length. FLIPPER is accurate when evaluated on non-redundant independent datasets, 99% precision and 99% sensitivity on PixelDB-25 and 87% precision and 88% sensitivity on DIBS-25. Finally, we used FLIPPER to process the entire Protein Data Bank and identified different classes of LIPs based on different binding modes and partner molecules. We provide a detailed description of these LIP categories and show that a large fraction of these regions are not detected by disorder predictors. All FLIPPER predictions are integrated in the MobiDB 4.0 database.
Collapse
Affiliation(s)
| | - Paolo Bonato
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy.
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
71
|
Shen B, Chen Z, Yu C, Chen T, Shi M, Li T. Computational Screening of Phase-separating Proteins. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:13-24. [PMID: 33610793 PMCID: PMC8498823 DOI: 10.1016/j.gpb.2020.11.003] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 11/17/2020] [Accepted: 12/10/2020] [Indexed: 11/27/2022]
Abstract
Phase separation is an important mechanism that mediates the compartmentalization of proteins in cells. Proteins that can undergo phase separation in cells share certain typical sequence features, like intrinsically disordered regions (IDRs) and multiple modular domains. Sequence-based analysis tools are commonly used in the screening of these proteins. However, current phase separation predictors are mostly designed for IDR-containing proteins, thus inevitably overlook the phase-separating proteins with relatively low IDR content. Features other than amino acid sequence could provide crucial information for identifying possible phase-separating proteins: protein–protein interaction (PPI) networks show multivalent interactions that underlie phase separation process; post-translational modifications (PTMs) are crucial in the regulation of phase separation behavior; spherical structures revealed in immunofluorescence (IF)images indicate condensed droplets formed by phase-separating proteins, distinguishing these proteins from non-phase-separating proteins. Here, we summarize the sequence-based tools for predicting phase-separating proteins and highlight the importance of incorporating PPIs, PTMs, and IF images into phase separation prediction in future studies.
Collapse
Affiliation(s)
- Boyan Shen
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Zhaoming Chen
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Chunyu Yu
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China; Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Taoyu Chen
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Minglei Shi
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Tingting Li
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China; Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China.
| |
Collapse
|
72
|
Lazar T, Martínez-Pérez E, Quaglia F, Hatos A, Chemes L, Iserte JA, Méndez NA, Garrone NA, Saldaño T, Marchetti J, Rueda A, Bernadó P, Blackledge M, Cordeiro TN, Fagerberg E, Forman-Kay JD, Fornasari M, Gibson TJ, Gomes GNW, Gradinaru C, Head-Gordon T, Jensen MR, Lemke E, Longhi S, Marino-Buslje C, Minervini G, Mittag T, Monzon A, Pappu RV, Parisi G, Ricard-Blum S, Ruff KM, Salladini E, Skepö M, Svergun D, Vallet S, Varadi M, Tompa P, Tosatto SCE, Piovesan D. PED in 2021: a major update of the protein ensemble database for intrinsically disordered proteins. Nucleic Acids Res 2021; 49:D404-D411. [PMID: 33305318 PMCID: PMC7778965 DOI: 10.1093/nar/gkaa1021] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/13/2020] [Accepted: 12/08/2020] [Indexed: 12/21/2022] Open
Abstract
The Protein Ensemble Database (PED) (https://proteinensemble.org), which holds structural ensembles of intrinsically disordered proteins (IDPs), has been significantly updated and upgraded since its last release in 2016. The new version, PED 4.0, has been completely redesigned and reimplemented with cutting-edge technology and now holds about six times more data (162 versus 24 entries and 242 versus 60 structural ensembles) and a broader representation of state of the art ensemble generation methods than the previous version. The database has a completely renewed graphical interface with an interactive feature viewer for region-based annotations, and provides a series of descriptors of the qualitative and quantitative properties of the ensembles. High quality of the data is guaranteed by a new submission process, which combines both automatic and manual evaluation steps. A team of biocurators integrate structured metadata describing the ensemble generation methodology, experimental constraints and conditions. A new search engine allows the user to build advanced queries and search all entry fields including cross-references to IDP-related resources such as DisProt, MobiDB, BMRB and SASBDB. We expect that the renewed PED will be useful for researchers interested in the atomic-level understanding of IDP function, and promote the rational, structure-based design of IDP-targeting drugs.
Collapse
Affiliation(s)
- Tamas Lazar
- VIB-VUB Center for Structural Biology, Flanders Institute for Biotechnology, Brussels 1050, Belgium
- Structural Biology Brussels, Bioengineering Sciences Department, Vrije Universiteit Brussel, Brussels 1050, Belgium
| | - Elizabeth Martínez-Pérez
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, C1405BWE, Argentina
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Federica Quaglia
- Dept. of Biomedical Sciences, University of Padua, Padova 35131, Italy
| | - András Hatos
- Dept. of Biomedical Sciences, University of Padua, Padova 35131, Italy
| | - Lucía B Chemes
- Instituto de Investigaciones Biotecnológicas “Dr. Rodolfo A. Ugalde’’, IIB-UNSAM, IIBIO-CONICET, Universidad Nacional de SanMartín, CP1650 San Martín, Buenos Aires, Argentina
| | - Javier A Iserte
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, C1405BWE, Argentina
| | - Nicolás A Méndez
- Instituto de Investigaciones Biotecnológicas “Dr. Rodolfo A. Ugalde’’, IIB-UNSAM, IIBIO-CONICET, Universidad Nacional de SanMartín, CP1650 San Martín, Buenos Aires, Argentina
| | - Nicolás A Garrone
- Instituto de Investigaciones Biotecnológicas “Dr. Rodolfo A. Ugalde’’, IIB-UNSAM, IIBIO-CONICET, Universidad Nacional de SanMartín, CP1650 San Martín, Buenos Aires, Argentina
| | - Tadeo E Saldaño
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Julia Marchetti
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Ana Julia Velez Rueda
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Pau Bernadó
- Centre de Biochimie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier 34090, France
| | | | - Tiago N Cordeiro
- Centre de Biochimie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier 34090, France
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras 2780-157, Portugal
| | - Eric Fagerberg
- Theoretical Chemistry, Lund University, Lund, POB 124, SE-221 00, Sweden
| | - Julie D Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, M5G 1X8, Ontario, Canada
- Department of Biochemistry, University of Toronto, Toronto, M5S 1A8, Ontario, Canada
| | - Maria S Fornasari
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Gregory-Neal W Gomes
- Department of Physics, University of Toronto, Toronto, M5S 1A7, Ontario, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, L5L 1C6, Ontario, Canada
| | - Claudiu C Gradinaru
- Department of Physics, University of Toronto, Toronto, M5S 1A7, Ontario, Canada
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, Mississauga, L5L 1C6, Ontario, Canada
| | - Teresa Head-Gordon
- Departments of Chemistry, Bioengineering, Chemical and Biomolecular Engineering University of California, Berkeley, CA 94720, USA
| | | | - Edward A Lemke
- Biocentre, Johannes Gutenberg-University Mainz, Mainz 55128, Germany
- Institute of Molecular Biology, Mainz 55128, Germany
| | - Sonia Longhi
- Aix-Marseille University, CNRS, Architecture et Fonction des Macromolécules Biologiques (AFMB), Marseille 13288, France
| | | | | | - Tanja Mittag
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | | | - Rohit V Pappu
- Department of Biomedical Engineering, Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, MO 63130, USA
| | - Gustavo Parisi
- Laboratorio de Química y Biología Computacional, Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal B1876BXD, Buenos Aires, Argentina
| | - Sylvie Ricard-Blum
- Univ Lyon, University Claude Bernard Lyon 1, CNRS, INSA Lyon, CPE, Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, Villeurbanne, 69629 Lyon Cedex 07, France
| | - Kiersten M Ruff
- Department of Biomedical Engineering, Center for Science & Engineering of Living Systems (CSELS), Washington University in St. Louis, MO 63130, USA
| | - Edoardo Salladini
- Aix-Marseille University, CNRS, Architecture et Fonction des Macromolécules Biologiques (AFMB), Marseille 13288, France
| | - Marie Skepö
- Theoretical Chemistry, Lund University, Lund, POB 124, SE-221 00, Sweden
- LINXS - Lund Institute of Advanced Neutron and X-ray Science, Lund 223 70, Sweden
| | - Dmitri Svergun
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Sylvain D Vallet
- Univ Lyon, University Claude Bernard Lyon 1, CNRS, INSA Lyon, CPE, Institute of Molecular and Supramolecular Chemistry and Biochemistry (ICBMS), UMR 5246, Villeurbanne, 69629 Lyon Cedex 07, France
| | - Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Peter Tompa
- To whom correspondence should be addressed. Tel +32 473 785386;
| | - Silvio C E Tosatto
- Correspondence may also be addressed to Silvio C. E. Tosatto. Tel: +39 049 827 6269;
| | - Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Padova 35131, Italy
| |
Collapse
|
73
|
Csizmadia G, Erdős G, Tordai H, Padányi R, Tosatto S, Dosztányi Z, Hegedűs T. The MemMoRF database for recognizing disordered protein regions interacting with cellular membranes. Nucleic Acids Res 2021; 49:D355-D360. [PMID: 33119751 PMCID: PMC7778998 DOI: 10.1093/nar/gkaa954] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 09/25/2020] [Accepted: 10/28/2020] [Indexed: 12/19/2022] Open
Abstract
Protein and lipid membrane interactions play fundamental roles in a large number of cellular processes (e.g. signalling, vesicle trafficking, or viral invasion). A growing number of examples indicate that such interactions can also rely on intrinsically disordered protein regions (IDRs), which can form specific reversible interactions not only with proteins but also with lipids. We named IDRs involved in such membrane lipid-induced disorder-to-order transition as MemMoRFs, in an analogy to IDRs exhibiting disorder-to-order transition upon interaction with protein partners termed Molecular Recognition Features (MoRFs). Currently, both the experimental detection and computational characterization of MemMoRFs are challenging, and information about these regions are scattered in the literature. To facilitate the related investigations we generated a comprehensive database of experimentally validated MemMoRFs based on manual curation of literature and structural data. To characterize the dynamics of MemMoRFs, secondary structure propensity and flexibility calculated from nuclear magnetic resonance chemical shifts were incorporated into the database. These data were supplemented by inclusion of sentences from papers, functional data and disease-related information. The MemMoRF database can be accessed via a user-friendly interface at https://memmorf.hegelab.org, potentially providing a central resource for the characterization of disordered regions in transmembrane and membrane-associated proteins.
Collapse
Affiliation(s)
- Georgina Csizmadia
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | - Gábor Erdős
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest 1117, Hungary
| | - Hedvig Tordai
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | - Rita Padányi
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | - Silvio Tosatto
- Department of Biomedical Sciences, University of Padua, Padua 35131, Italy
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest 1117, Hungary
| | - Tamás Hegedűs
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| |
Collapse
|
74
|
Abstract
Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.
Collapse
Affiliation(s)
- Gal Almog
- Department of Pathology & Laboratory Medicine, Western University, Dental Sciences Building, Rm. 4044 London, Ontario, Canada, N6A 5C1
| | - Abayomi S Olabode
- Department of Pathology & Laboratory Medicine, Western University, Dental Sciences Building, Rm. 4044 London, Ontario, Canada, N6A 5C1
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, Dental Sciences Building, Rm. 4044 London, Ontario, Canada, N6A 5C1.,Department of Applied Mathematics, Western University, Middlesex College Room 255, 1151 Richmond Street London, Ontario, Canada, N6A 5B7.,Department of Microbiology & Immunology, Western University, 1151 Richmond Street London, Ontario, Canada, N6A 3K
| |
Collapse
|
75
|
Wulff-Fuentes E, Berendt RR, Massman L, Danner L, Malard F, Vora J, Kahsay R, Olivier-Van Stichelen S. The human O-GlcNAcome database and meta-analysis. Sci Data 2021; 8:25. [PMID: 33479245 PMCID: PMC7820439 DOI: 10.1038/s41597-021-00810-4] [Citation(s) in RCA: 125] [Impact Index Per Article: 41.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 01/05/2021] [Indexed: 02/06/2023] Open
Abstract
Over the past 35 years, ~1700 articles have characterized protein O-GlcNAcylation. Found in almost all living organisms, this post-translational modification of serine and threonine residues is highly conserved and key to biological processes. With half of the primary research articles using human models, the O-GlcNAcome recently reached a milestone of 5000 human proteins identified. Herein, we provide an extensive inventory of human O-GlcNAcylated proteins, their O-GlcNAc sites, identification methods, and corresponding references ( www.oglcnac.mcw.edu ). In the absence of a comprehensive online resource for O-GlcNAcylated proteins, this list serves as the only database of O-GlcNAcylated proteins. Based on the thorough analysis of the amino acid sequence surrounding 7002 O-GlcNAc sites, we progress toward a more robust semi-consensus sequence for O-GlcNAcylation. Moreover, we offer a comprehensive meta-analysis of human O-GlcNAcylated proteins for protein domains, cellular and tissue distribution, and pathways in health and diseases, reinforcing that O-GlcNAcylation is a master regulator of cell signaling, equal to the widely studied phosphorylation.
Collapse
Affiliation(s)
| | - Rex R Berendt
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, USA
| | - Logan Massman
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, USA
| | - Laura Danner
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, USA
| | - Florian Malard
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, USA
| | - Jeet Vora
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, 20052, USA
| | - Robel Kahsay
- Department of Biochemistry & Molecular Medicine, The George Washington School of Medicine and Health Sciences, Washington, DC, 20052, USA
| | | |
Collapse
|
76
|
Piovesan D, Necci M, Escobedo N, Monzon AM, Hatos A, Mičetić I, Quaglia F, Paladin L, Ramasamy P, Dosztányi Z, Vranken WF, Davey N, Parisi G, Fuxreiter M, Tosatto SE. MobiDB: intrinsically disordered proteins in 2021. Nucleic Acids Res 2021; 49:D361-D367. [PMID: 33237329 PMCID: PMC7779018 DOI: 10.1093/nar/gkaa1058] [Citation(s) in RCA: 130] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/16/2020] [Accepted: 11/19/2020] [Indexed: 12/13/2022] Open
Abstract
The MobiDB database (URL: https://mobidb.org/) provides predictions and annotations for intrinsically disordered proteins. Here, we report recent developments implemented in MobiDB version 4, regarding the database format, with novel types of annotations and an improved update process. The new website includes a re-designed user interface, a more effective search engine and advanced API for programmatic access. The new database schema gives more flexibility for the users, as well as simplifying the maintenance and updates. In addition, the new entry page provides more visualisation tools including customizable feature viewer and graphs of the residue contact maps. MobiDB v4 annotates the binding modes of disordered proteins, whether they undergo disorder-to-order transitions or remain disordered in the bound state. In addition, disordered regions undergoing liquid-liquid phase separation or post-translational modifications are defined. The integrated information is presented in a simplified interface, which enables faster searches and allows large customized datasets to be downloaded in TSV, Fasta or JSON formats. An alternative advanced interface allows users to drill deeper into features of interest. A new statistics page provides information at database and proteome levels. The new MobiDB version presents state-of-the-art knowledge on disordered proteins and improves data accessibility for both computational and experimental users.
Collapse
Affiliation(s)
- Damiano Piovesan
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Marco Necci
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Nahuel Escobedo
- Dept. of Science and Technology, Universidad Nacional de Quilmes, Buenos Aires, Argentina
| | | | - András Hatos
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Ivan Mičetić
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Federica Quaglia
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Lisanna Paladin
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Pathmanaban Ramasamy
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
- Centre for Structural Biology, VIB, Pleinlaan 2, 1050 Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, Ghent 9000, Belgium
| | | | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
- Centre for Structural Biology, VIB, Pleinlaan 2, 1050 Brussels, Belgium
| | - Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London, SW3 6JB, UK
| | - Gustavo Parisi
- Dept. of Science and Technology, Universidad Nacional de Quilmes, Buenos Aires, Argentina
| | - Monika Fuxreiter
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| | - Silvio C E Tosatto
- Dept. of Biomedical Sciences, University of Padua, Via Ugo Bassi 58/B, Padua 35121, Italy
| |
Collapse
|
77
|
Zhao B, Katuwawala A, Oldfield CJ, Dunker AK, Faraggi E, Gsponer J, Kloczkowski A, Malhis N, Mirdita M, Obradovic Z, Söding J, Steinegger M, Zhou Y, Kurgan L. DescribePROT: database of amino acid-level protein structure and function predictions. Nucleic Acids Res 2021; 49:D298-D308. [PMID: 33119734 PMCID: PMC7778963 DOI: 10.1093/nar/gkaa931] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/11/2020] [Accepted: 10/05/2020] [Indexed: 12/30/2022] Open
Abstract
We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | | | - A Keith Dunker
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Eshel Faraggi
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine at the Nationwide Children's Hospital, and Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Zoran Obradovic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Martin Steinegger
- School of Biological Sciences and Institute of Molecular Biology & Genetics, Seoul National University, Seoul, Republic of Korea
| | - Yaoqi Zhou
- Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
78
|
Abstract
Intrinsically disordered proteins, defying the traditional protein structure-function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.
Collapse
|
79
|
Seoane B, Carbone A. The complexity of protein interactions unravelled from structural disorder. PLoS Comput Biol 2021; 17:e1008546. [PMID: 33417598 PMCID: PMC7846008 DOI: 10.1371/journal.pcbi.1008546] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 01/29/2021] [Accepted: 11/18/2020] [Indexed: 11/19/2022] Open
Abstract
The importance of unstructured biology has quickly grown during the last decades accompanying the explosion of the number of experimentally resolved protein structures. The idea that structural disorder might be a novel mechanism of protein interaction is widespread in the literature, although the number of statistically significant structural studies supporting this idea is surprisingly low. At variance with previous works, our conclusions rely exclusively on a large-scale analysis of all the 134337 X-ray crystallographic structures of the Protein Data Bank averaged over clusters of almost identical protein sequences. In this work, we explore the complexity of the organisation of all the interaction interfaces observed when a protein lies in alternative complexes, showing that interfaces progressively add up in a hierarchical way, which is reflected in a logarithmic law for the size of the union of the interface regions on the number of distinct interfaces. We further investigate the connection of this complexity with different measures of structural disorder: the standard missing residues and a new definition, called "soft disorder", that covers all the flexible and structurally amorphous residues of a protein. We show evidences that both the interaction interfaces and the soft disordered regions tend to involve roughly the same amino-acids of the protein, and preliminary results suggesting that soft disorder spots those surface regions where new interfaces are progressively accommodated by complex formation. In fact, our results suggest that structurally disordered regions not only carry crucial information about the location of alternative interfaces within complexes, but also about the order of the assembly. We verify these hypotheses in several examples, such as the DNA binding domains of P53 and P73, the C3 exoenzyme, and two known biological orders of assembly. We finally compare our measures of structural disorder with several disorder bioinformatics predictors, showing that these latter are optimised to predict the residues that are missing in all the alternative structures of a protein and they are not able to catch the progressive evolution of the disordered regions upon complex formation. Yet, the predicted residues, when not missing, tend to be characterised as soft disordered regions.
Collapse
Affiliation(s)
- Beatriz Seoane
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, Paris, France
- Sorbonne Université, Institut des Sciences du Calcul et des Données, Paris, France
- Departamento de Física Teórica, Universidad Complutense, Madrid, Spain
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, Paris, France
| |
Collapse
|
80
|
Ong E, Huang X, Pearce R, Zhang Y, He Y. Computational design of SARS-CoV-2 spike glycoproteins to increase immunogenicity by T cell epitope engineering. Comput Struct Biotechnol J 2020; 19:518-529. [PMID: 33398234 PMCID: PMC7773544 DOI: 10.1016/j.csbj.2020.12.039] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 12/24/2020] [Accepted: 12/24/2020] [Indexed: 01/12/2023] Open
Abstract
The development of effective and safe vaccines is the ultimate way to efficiently stop the ongoing COVID-19 pandemic, which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Built on the fact that SARS-CoV-2 utilizes the association of its Spike (S) protein with the human angiotensin-converting enzyme 2 (ACE2) receptor to invade host cells, we computationally redesigned the S protein sequence to improve its immunogenicity and antigenicity. Toward this purpose, we extended an evolutionary protein design algorithm, EvoDesign, to create thousands of stable S protein variants that perturb the core protein sequence but keep the surface conformation and B cell epitopes. The T cell epitope content and similarity scores of the perturbed sequences were calculated and evaluated. Out of 22,914 designs with favorable stability energy, 301 candidates contained at least two pre-existing immunity-related epitopes and had promising immunogenic potential. The benchmark tests showed that, although the epitope restraints were not included in the scoring function of EvoDesign, the top S protein design successfully recovered 31 out of the 32 major histocompatibility complex (MHC)-II T cell promiscuous epitopes in the native S protein, where two epitopes were present in all seven human coronaviruses. Moreover, the newly designed S protein introduced nine new MHC-II T cell promiscuous epitopes that do not exist in the wildtype SARS-CoV-2. These results demonstrated a new and effective avenue to enhance a target protein's immunogenicity using rational protein design, which could be applied for new vaccine design against COVID-19 and other pathogens.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
81
|
Peng Z, Xing Q, Kurgan L. APOD: accurate sequence-based predictor of disordered flexible linkers. BIOINFORMATICS (OXFORD, ENGLAND) 2020; 36:i754-i761. [PMID: 33381830 DOI: 10.1101/2020.12.03.409755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 05/28/2023]
Abstract
MOTIVATION Disordered flexible linkers (DFLs) are abundant and functionally important intrinsically disordered regions that connect protein domains and structural elements within domains and which facilitate disorder-based allosteric regulation. Although computational estimates suggest that thousands of proteins have DFLs, they were annotated experimentally in <200 proteins. This substantial annotation gap can be reduced with the help of accurate computational predictors. The sole predictor of DFLs, DFLpred, trade-off accuracy for shorter runtime by excluding relevant but computationally costly predictive inputs. Moreover, it relies on the local/window-based information while lacking to consider useful protein-level characteristics. RESULTS We conceptualize, design and test APOD (Accurate Predictor Of DFLs), the first highly accurate predictor that utilizes both local- and protein-level inputs that quantify propensity for disorder, sequence composition, sequence conservation and selected putative structural properties. Consequently, APOD offers significantly more accurate predictions when compared with its faster predecessor, DFLpred, and several other alternative ways to predict DFLs. These improvements stem from the use of a more comprehensive set of inputs that cover the protein-level information and the application of a more sophisticated predictive model, a well-parametrized support vector machine. APOD achieves area under the curve = 0.82 (28% improvement over DFLpred) and Matthews correlation coefficient = 0.42 (180% increase over DFLpred) when tested on an independent/low-similarity test dataset. Consequently, APOD is a suitable choice for accurate and small-scale prediction of DFLs. AVAILABILITY AND IMPLEMENTATION https://yanglab.nankai.edu.cn/APOD/.
Collapse
Affiliation(s)
- Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
- School of Statistics and Data Science, Nankai University, Tianjin 300074, China
| | - Qian Xing
- Center for Applied Mathematics, Tianjin University, Tianjin 300072, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
82
|
Mier P, Andrade-Navarro MA. Assessing the low complexity of protein sequences via the low complexity triangle. PLoS One 2020; 15:e0239154. [PMID: 33378336 PMCID: PMC7773278 DOI: 10.1371/journal.pone.0239154] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
- * E-mail:
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Mainz, Germany
| |
Collapse
|
83
|
Hardenberg M, Horvath A, Ambrus V, Fuxreiter M, Vendruscolo M. Widespread occurrence of the droplet state of proteins in the human proteome. Proc Natl Acad Sci U S A 2020; 117:33254-33262. [PMID: 33318217 PMCID: PMC7777240 DOI: 10.1073/pnas.2007670117] [Citation(s) in RCA: 153] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
A wide range of proteins have been reported to condensate into a dense liquid phase, forming a reversible droplet state. Failure in the control of the droplet state can lead to the formation of the more stable amyloid state, which is often disease-related. These observations prompt the question of how many proteins can undergo liquid-liquid phase separation. Here, in order to address this problem, we discuss the biophysical principles underlying the droplet state of proteins by analyzing current evidence for droplet-driver and droplet-client proteins. Based on the concept that the droplet state is stabilized by the large conformational entropy associated with nonspecific side-chain interactions, we develop the FuzDrop method to predict droplet-promoting regions and proteins, which can spontaneously phase separate. We use this approach to carry out a proteome-level study to rank proteins according to their propensity to form the droplet state, spontaneously or via partner interactions. Our results lead to the conclusion that the droplet state could be, at least transiently, accessible to most proteins under conditions found in the cellular environment.
Collapse
Affiliation(s)
- Maarten Hardenberg
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | - Attila Horvath
- The John Curtin School of Medical Research, The Australian National University, Canberra, ACT 2601, Australia
| | - Viktor Ambrus
- Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, H-4010 Debrecen, Hungary
| | - Monika Fuxreiter
- Laboratory of Protein Dynamics, Department of Biochemistry and Molecular Biology, University of Debrecen, H-4010 Debrecen, Hungary;
- Department of Biomedical Sciences, University of Padova, 35131 Padova, Italy
| | - Michele Vendruscolo
- Centre for Misfolding Diseases, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom;
| |
Collapse
|
84
|
Anbo H, Amagai H, Fukuchi S. NeProc predicts binding segments in intrinsically disordered regions without learning binding region sequences. Biophys Physicobiol 2020; 17:147-154. [PMID: 33304713 PMCID: PMC7692026 DOI: 10.2142/biophysico.bsj-2020026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 10/29/2020] [Indexed: 12/01/2022] Open
Abstract
Intrinsically disordered proteins are those proteins with intrinsically disordered regions. One of the unique characteristics of intrinsically disordered proteins is the existence of functional segments in intrinsically dis-ordered regions. These segments are involved in binding to partner molecules, such as protein and DNA, and play important roles in signaling pathways and/or transcriptional regulation. Although there are databases that gather information on such disordered binding regions, data remain limited. Therefore, it is desirable to develop programs to predict the disordered binding regions without using data for the binding regions. We developed a program, NeProc, to predict the disordered binding regions, which can be regarded as intrinsically disordered regions with a structural propensity. We only used data for the structural domains and intrinsically disordered regions to detect such regions. NeProc accepts a query amino acid sequence converted into a position specific score matrix, and uses two neural networks that employ different window sizes, a neural network of short windows, and a neural network of long windows. The performance of NeProc was comparable to that of existing programs of the disordered binding region prediction. This result presents the possibility to overcome the shortage of the disordered binding region data in the development of the prediction programs for these binding regions. NeProc is available at http://flab.neproc.org/neproc/index.html.
Collapse
Affiliation(s)
- Hiroto Anbo
- Department of Life Science and Informatics, Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Gunma 371-0816, Japan
| | - Hiroki Amagai
- Department of Life Science and Informatics, Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Gunma 371-0816, Japan
| | - Satoshi Fukuchi
- Department of Life Science and Informatics, Faculty of Engineering, Maebashi Institute of Technology, Maebashi, Gunma 371-0816, Japan
| |
Collapse
|
85
|
The Role of Protein Disorder in Nuclear Transport and in Its Subversion by Viruses. Cells 2020; 9:cells9122654. [PMID: 33321790 PMCID: PMC7764567 DOI: 10.3390/cells9122654] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 12/08/2020] [Accepted: 12/08/2020] [Indexed: 12/12/2022] Open
Abstract
The transport of host proteins into and out of the nucleus is key to host function. However, nuclear transport is restricted by nuclear pores that perforate the nuclear envelope. Protein intrinsic disorder is an inherent feature of this selective transport barrier and is also a feature of the nuclear transport receptors that facilitate the active nuclear transport of cargo, and the nuclear transport signals on the cargo itself. Furthermore, intrinsic disorder is an inherent feature of viral proteins and viral strategies to disrupt host nucleocytoplasmic transport to benefit their replication. In this review, we highlight the role that intrinsic disorder plays in the nuclear transport of host and viral proteins. We also describe viral subversion mechanisms of the host nuclear transport machinery in which intrinsic disorder is a feature. Finally, we discuss nuclear import and export as therapeutic targets for viral infectious disease.
Collapse
|
86
|
Katuwawala A, Kurgan L. Comparative Assessment of Intrinsic Disorder Predictions with a Focus on Protein and Nucleic Acid-Binding Proteins. Biomolecules 2020; 10:E1636. [PMID: 33291838 PMCID: PMC7762010 DOI: 10.3390/biom10121636] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 01/18/2023] Open
Abstract
With over 60 disorder predictors, users need help navigating the predictor selection task. We review 28 surveys of disorder predictors, showing that only 11 include assessment of predictive performance. We identify and address a few drawbacks of these past surveys. To this end, we release a novel benchmark dataset with reduced similarity to the training sets of the considered predictors. We use this dataset to perform a first-of-its-kind comparative analysis that targets two large functional families of disordered proteins that interact with proteins and with nucleic acids. We show that limiting sequence similarity between the benchmark and the training datasets has a substantial impact on predictive performance. We also demonstrate that predictive quality is sensitive to the use of the well-annotated order and inclusion of the fully structured proteins in the benchmark datasets, both of which should be considered in future assessments. We identify three predictors that provide favorable results using the new benchmark set. While we find that VSL2B offers the most accurate and robust results overall, ESpritz-DisProt and SPOT-Disorder perform particularly well for disordered proteins. Moreover, we find that predictions for the disordered protein-binding proteins suffer low predictive quality compared to generic disordered proteins and the disordered nucleic acids-binding proteins. This can be explained by the high disorder content of the disordered protein-binding proteins, which makes it difficult for the current methods to accurately identify ordered regions in these proteins. This finding motivates the development of a new generation of methods that would target these difficult-to-predict disordered proteins. We also discuss resources that support users in collecting and identifying high-quality disorder predictions.
Collapse
Affiliation(s)
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| |
Collapse
|
87
|
Brocca S, Grandori R, Longhi S, Uversky V. Liquid-Liquid Phase Separation by Intrinsically Disordered Protein Regions of Viruses: Roles in Viral Life Cycle and Control of Virus-Host Interactions. Int J Mol Sci 2020; 21:E9045. [PMID: 33260713 PMCID: PMC7730420 DOI: 10.3390/ijms21239045] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 11/23/2020] [Accepted: 11/24/2020] [Indexed: 12/13/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) are unable to adopt a unique 3D structure under physiological conditions and thus exist as highly dynamic conformational ensembles. IDPs are ubiquitous and widely spread in the protein realm. In the last decade, compelling experimental evidence has been gathered, pointing to the ability of IDPs and intrinsically disordered regions (IDRs) to undergo liquid-liquid phase separation (LLPS), a phenomenon driving the formation of membrane-less organelles (MLOs). These biological condensates play a critical role in the spatio-temporal organization of the cell, where they exert a multitude of key biological functions, ranging from transcriptional regulation and silencing to control of signal transduction networks. After introducing IDPs and LLPS, we herein survey available data on LLPS by IDPs/IDRs of viral origin and discuss their functional implications. We distinguish LLPS associated with viral replication and trafficking of viral components, from the LLPS-mediated interference of viruses with host cell functions. We discuss emerging evidence on the ability of plant virus proteins to interfere with the regulation of MLOs of the host and propose that bacteriophages can interfere with bacterial LLPS, as well. We conclude by discussing how LLPS could be targeted to treat phase separation-associated diseases, including viral infections.
Collapse
Affiliation(s)
- Stefania Brocca
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, 20126 Milano, Italy
| | - Rita Grandori
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, 20126 Milano, Italy
| | - Sonia Longhi
- Laboratoire Architecture et Fonction des Macromolécules Biologiques (AFMB), Aix-Marseille University and CNRS, 13288 Marseille, France
| | - Vladimir Uversky
- Department of Molecular Medicine, Byrd Alzheimer’s Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33601, USA
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia
| |
Collapse
|
88
|
Goh GKM, Dunker AK, Foster JA, Uversky VN. A Novel Strategy for the Development of Vaccines for SARS-CoV-2 (COVID-19) and Other Viruses Using AI and Viral Shell Disorder. J Proteome Res 2020; 19:4355-4363. [PMID: 33006287 PMCID: PMC7640981 DOI: 10.1021/acs.jproteome.0c00672] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Indexed: 12/29/2022]
Abstract
A model that predicts levels of coronavirus (CoV) respiratory and fecal-oral transmission potentials based on the shell disorder has been built using neural network (artificial intelligence, AI) analysis of the percentage of disorder (PID) in the nucleocapsid, N, and membrane, M, proteins of the inner and outer viral shells, respectively. Using primarily the PID of N, SARS-CoV-2 is grouped as having intermediate levels of both respiratory and fecal-oral transmission potentials. Related studies, using similar methodologies, have found strong positive correlations between virulence and inner shell disorder among numerous viruses, including Nipah, Ebola, and Dengue viruses. There is some evidence that this is also true for SARS-CoV-2 and SARS-CoV, which have N PIDs of 48% and 50%, and case-fatality rates of 0.5-5% and 10.9%, respectively. The underlying relationship between virulence and respiratory potentials has to do with the viral loads of vital organs and body fluids, respectively. Viruses can spread by respiratory means only if the viral loads in saliva and mucus exceed certain minima. Similarly, a patient is likelier to die when the viral load overwhelms vital organs. Greater disorder in inner shell proteins has been known to play important roles in the rapid replication of viruses by enhancing the efficiency pertaining to protein-protein/DNA/RNA/lipid bindings. This paper suggests a novel strategy in attenuating viruses involving comparison of disorder patterns of inner shells (N) of related viruses to identify residues and regions that could be ideal for mutation. The M protein of SARS-CoV-2 has one of the lowest M PID values (6%) in its family, and therefore, this virus has one of the hardest outer shells, which makes it resistant to antimicrobial enzymes in body fluid. While this is likely responsible for its greater contagiousness, the risks of creating an attenuated virus with a more disordered M are discussed.
Collapse
Affiliation(s)
| | - A. Keith Dunker
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - James A. Foster
- Department
of Biological Sciences, University of Idaho, Moscow, Idaho 83844, United States
- Institute
for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho 83844, United States
| | - Vladimir N. Uversky
- Department
of Molecular Medicine, USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33620, United States
- Laboratory
of New Methods in Biology, Institute for Biological Instrumentation
of the Russian Academy of Sciences, Federal
Research Center “Pushchino Scientific Center for Biological
Research of the Russian Academy of Sciences”, Pushchino, Moscow region 142290, Russia
| |
Collapse
|
89
|
Aledo JC, Aledo P. Susceptibility of Protein Methionine Oxidation in Response to Hydrogen Peroxide Treatment-Ex Vivo Versus In Vitro: A Computational Insight. Antioxidants (Basel) 2020; 9:antiox9100987. [PMID: 33066324 PMCID: PMC7602125 DOI: 10.3390/antiox9100987] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/08/2020] [Accepted: 10/09/2020] [Indexed: 11/25/2022] Open
Abstract
Methionine oxidation plays a relevant role in cell signaling. Recently, we built a database containing thousands of proteins identified as sulfoxidation targets. Using this resource, we have now developed a computational approach aimed at characterizing the oxidation of human methionyl residues. We found that proteins oxidized in both cell-free preparations (in vitro) and inside living cells (ex vivo) were enriched in methionines and intrinsically disordered regions. However, proteins oxidized ex vivo tended to be larger and less abundant than those oxidized in vitro. Another distinctive feature was their subcellular localizations. Thus, nuclear and mitochondrial proteins were preferentially oxidized ex vivo but not in vitro. The nodes corresponding with ex vivo and in vitro oxidized proteins in a network based on gene ontology terms showed an assortative mixing suggesting that ex vivo oxidized proteins shared among them molecular functions and biological processes. This was further supported by the observation that proteins from the ex vivo set were co-regulated more often than expected by chance. We also investigated the sequence environment of oxidation sites. Glutamate and aspartate were overrepresented in these environments regardless the group. In contrast, tyrosine, tryptophan and histidine were clearly avoided but only in the environments of the ex vivo sites. A hypothetical mechanism of methionine oxidation accounts for these observations presented.
Collapse
|
90
|
Jarnot P, Ziemska-Legiecka J, Dobson L, Merski M, Mier P, Andrade-Navarro MA, Hancock JM, Dosztányi Z, Paladin L, Necci M, Piovesan D, Tosatto SCE, Promponas VJ, Grynberg M, Gruca A. PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins. Nucleic Acids Res 2020; 48:W77-W84. [PMID: 32421769 PMCID: PMC7319588 DOI: 10.1093/nar/gkaa339] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 04/08/2020] [Accepted: 05/01/2020] [Indexed: 12/25/2022] Open
Abstract
Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity-a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.
Collapse
Affiliation(s)
- Patryk Jarnot
- Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | | | - Laszlo Dobson
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083 Budapest, Hungary.,Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, 1117 Budapest, Hungary
| | - Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warsaw, Poland
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
| | - John M Hancock
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös LorándUniversity, Budapest, Pázmány Péter stny 1/c 1117, Budapest, Hungary
| | - Lisanna Paladin
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Marco Necci
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, P.O. Box 20537, Nicosia, CY 1678, Cyprus
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawinskiego 5A, 02-106 Warsaw, Poland
| | - Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| |
Collapse
|
91
|
Carmi G, Tagore S, Gorohovski A, Sivan A, Raviv-Shay D, Frenkel-Morgenstern M. Design principles of gene evolution for niche adaptation through changes in protein-protein interaction networks. Sci Rep 2020; 10:15628. [PMID: 32973219 PMCID: PMC7519090 DOI: 10.1038/s41598-020-71976-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 08/24/2020] [Indexed: 12/15/2022] Open
Abstract
In contrast to fossorial and above-ground organisms, subterranean species have adapted to the extreme stresses of living underground. We analyzed the predicted protein–protein interactions (PPIs) of all gene products, including those of stress-response genes, among nine subterranean, ten fossorial, and 13 aboveground species. We considered 10,314 unique orthologous protein families and constructed 5,879,879 PPIs in all organisms using ChiPPI. We found strong association between PPI network modulation and adaptation to specific habitats, noting that mutations in genes and changes in protein sequences were not linked directly with niche adaptation in the organisms sampled. Thus, orthologous hypoxia, heat-shock, and circadian clock proteins were found to cluster according to habitat, based on PPIs rather than on sequence similarities. Curiously, "ordered" domains were preserved in aboveground species, while "disordered" domains were conserved in subterranean organisms, and confirmed for proteins in DistProt database. Furthermore, proteins with disordered regions were found to adopt significantly less optimal codon usage in subterranean species than in fossorial and above-ground species. These findings reveal design principles of protein networks by means of alterations in protein domains, thus providing insight into deep mechanisms of evolutionary adaptation, generally, and particularly of species to underground living and other confined habitats.
Collapse
Affiliation(s)
- Gon Carmi
- The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, 13195, Safed, Israel
| | - Somnath Tagore
- The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, 13195, Safed, Israel.,Department of Systems Biology, Columbia University Medical Center, Herbert Irving Cancer Research Center, New York, USA
| | - Alessandro Gorohovski
- The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, 13195, Safed, Israel
| | - Aviad Sivan
- The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, 13195, Safed, Israel
| | - Dorith Raviv-Shay
- The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, 13195, Safed, Israel
| | | |
Collapse
|
92
|
Relevance of Electrostatic Charges in Compactness, Aggregation, and Phase Separation of Intrinsically Disordered Proteins. Int J Mol Sci 2020; 21:ijms21176208. [PMID: 32867340 PMCID: PMC7503639 DOI: 10.3390/ijms21176208] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 08/22/2020] [Accepted: 08/23/2020] [Indexed: 12/20/2022] Open
Abstract
The abundance of intrinsic disorder in the protein realm and its role in a variety of physiological and pathological cellular events have strengthened the interest of the scientific community in understanding the structural and dynamical properties of intrinsically disordered proteins (IDPs) and regions (IDRs). Attempts at rationalizing the general principles underlying both conformational properties and transitions of IDPs/IDRs must consider the abundance of charged residues (Asp, Glu, Lys, and Arg) that typifies these proteins, rendering them assimilable to polyampholytes or polyelectrolytes. Their conformation strongly depends on both the charge density and distribution along the sequence (i.e., charge decoration) as highlighted by recent experimental and theoretical studies that have introduced novel descriptors. Published experimental data are revisited herein in the frame of this formalism, in a new and possibly unitary perspective. The physicochemical properties most directly affected by charge density and distribution are compaction and solubility, which can be described in a relatively simplified way by tools of polymer physics. Dissecting factors controlling such properties could contribute to better understanding complex biological phenomena, such as fibrillation and phase separation. Furthermore, this knowledge is expected to have enormous practical implications for the design, synthesis, and exploitation of bio-derived materials and the control of natural biological processes.
Collapse
|
93
|
DispHred: A Server to Predict pH-Dependent Order-Disorder Transitions in Intrinsically Disordered Proteins. Int J Mol Sci 2020; 21:ijms21165814. [PMID: 32823616 PMCID: PMC7461198 DOI: 10.3390/ijms21165814] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 12/24/2022] Open
Abstract
The natively unfolded nature of intrinsically disordered proteins (IDPs) relies on several physicochemical principles, of which the balance between a low sequence hydrophobicity and a high net charge appears to be critical. Under this premise, it is well-known that disordered proteins populate a defined region of the charge–hydropathy (C–H) space and that a linear boundary condition is sufficient to distinguish between folded and disordered proteins, an approach widely applied for the prediction of protein disorder. Nevertheless, it is evident that the C–H relation of a protein is not unalterable but can be modulated by factors extrinsic to its sequence. Here, we applied a C–H-based analysis to develop a computational approach that evaluates sequence disorder as a function of pH, assuming that both protein net charge and hydrophobicity are dependent on pH solution. On that basis, we developed DispHred, the first pH-dependent predictor of protein disorder. Despite its simplicity, DispHred displays very high accuracy in identifying pH-induced order/disorder protein transitions. DispHred might be useful for diverse applications, from the analysis of conditionally disordered segments to the synthetic design of disorder tags for biotechnological applications. Importantly, since many disorder predictors use hydrophobicity as an input, the here developed framework can be implemented in other state-of-the-art algorithms.
Collapse
|
94
|
Harrison PM. Variable absorption of mutational trends by prion-forming domains during Saccharomycetes evolution. PeerJ 2020; 8:e9669. [PMID: 32844065 PMCID: PMC7415223 DOI: 10.7717/peerj.9669] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 07/16/2020] [Indexed: 12/13/2022] Open
Abstract
Prions are self-propagating alternative states of protein domains. They are linked to both diseases and functional protein roles in eukaryotes. Prion-forming domains in Saccharomyces cerevisiae are typically domains with high intrinsic protein disorder (i.e., that remain unfolded in the cell during at least some part of their functioning), that are converted to self-replicating amyloid forms. S. cerevisiae is a member of the fungal class Saccharomycetes, during the evolution of which a large population of prion-like domains has appeared. It is still unclear what principles might govern the molecular evolution of prion-forming domains, and intrinsically disordered domains generally. Here, it is discovered that in a set of such prion-forming domains some evolve in the fungal class Saccharomycetes in such a way as to absorb general mutation biases across millions of years, whereas others do not, indicating a spectrum of selection pressures on composition and sequence. Thus, if the bias-absorbing prion formers are conserving a prion-forming capability, then this capability is not interfered with by the absorption of bias changes over the duration of evolutionary epochs. Evidence is discovered for selective constraint against the occurrence of lysine residues (which likely disrupt prion formation) in S. cerevisiae prion-forming domains as they evolve across Saccharomycetes. These results provide a case study of the absorption of mutational trends by compositionally biased domains, and suggest methodology for assessing selection pressures on the composition of intrinsically disordered regions.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Monteal, Quebec, Canada
| |
Collapse
|
95
|
Liu H, Jeffery CJ. Moonlighting Proteins in the Fuzzy Logic of Cellular Metabolism. Molecules 2020; 25:molecules25153440. [PMID: 32751110 PMCID: PMC7435893 DOI: 10.3390/molecules25153440] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 07/09/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022] Open
Abstract
The numerous interconnected biochemical pathways that make up the metabolism of a living cell comprise a fuzzy logic system because of its high level of complexity and our inability to fully understand, predict, and model the many activities, how they interact, and their regulation. Each cell contains thousands of proteins with changing levels of expression, levels of activity, and patterns of interactions. Adding more layers of complexity is the number of proteins that have multiple functions. Moonlighting proteins include a wide variety of proteins where two or more functions are performed by one polypeptide chain. In this article, we discuss examples of proteins with variable functions that contribute to the fuzziness of cellular metabolism.
Collapse
Affiliation(s)
- Haipeng Liu
- Center for Biomolecular Sciences, College of Pharmacy, University of Illinois at Chicago, 900 South Ashland Avenue, Chicago, IL 60607, USA;
| | - Constance J. Jeffery
- Department of Biological Sciences, University of Illinois at Chicago, 900 South Ashland Avenue, Chicago, IL 60607, USA
- Correspondence: ; Tel.: +1-312-996-3168
| |
Collapse
|
96
|
Monzon AM, Necci M, Quaglia F, Walsh I, Zanotti G, Piovesan D, Tosatto SCE. Experimentally Determined Long Intrinsically Disordered Protein Regions Are Now Abundant in the Protein Data Bank. Int J Mol Sci 2020; 21:ijms21124496. [PMID: 32599863 PMCID: PMC7349999 DOI: 10.3390/ijms21124496] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 06/18/2020] [Accepted: 06/19/2020] [Indexed: 01/12/2023] Open
Abstract
Intrinsically disordered protein regions are commonly defined from missing electron density in X-ray structures. Experimental evidence for long disorder regions (LDRs) of at least 30 residues was so far limited to manually curated proteins. Here, we describe a comprehensive and large-scale analysis of experimental LDRs for 3133 unique proteins, demonstrating an increasing coverage of intrinsic disorder in the Protein Data Bank (PDB) in the last decade. The results suggest that long missing residue regions are a good quality source to annotate intrinsically disordered regions and perform functional analysis in large data sets. The consensus approach used to define LDRs allows to evaluate context dependent disorder and provide a common definition at the protein level.
Collapse
Affiliation(s)
- Alexander Miguel Monzon
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Ian Walsh
- Bioprocessing Technology Institute, A*STAR, Singapore 138668, Singapore;
| | - Giuseppe Zanotti
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
- Correspondence: (D.P.); (S.C.E.T.)
| | - Silvio C. E. Tosatto
- Department of Biomedical Sciences, University of Padua, 35131 Padua, Italy; (A.M.M.); (M.N.); (F.Q.); (G.Z.)
- Correspondence: (D.P.); (S.C.E.T.)
| |
Collapse
|
97
|
Rademaker D, van Dijk J, Titulaer W, Lange J, Vriend G, Xue L. The Future of Protein Secondary Structure Prediction Was Invented by Oleg Ptitsyn. Biomolecules 2020; 10:biom10060910. [PMID: 32560074 PMCID: PMC7355469 DOI: 10.3390/biom10060910] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 06/02/2020] [Indexed: 01/15/2023] Open
Abstract
When Oleg Ptitsyn and his group published the first secondary structure prediction for a protein sequence, they started a research field that is still active today. Oleg Ptitsyn combined fundamental rules of physics with human understanding of protein structures. Most followers in this field, however, use machine learning methods and aim at the highest (average) percentage correctly predicted residues in a set of proteins that were not used to train the prediction method. We show that one single method is unlikely to predict the secondary structure of all protein sequences, with the exception, perhaps, of future deep learning methods based on very large neural networks, and we suggest that some concepts pioneered by Oleg Ptitsyn and his group in the 70s of the previous century likely are today’s best way forward in the protein secondary structure prediction field.
Collapse
Affiliation(s)
- Daniel Rademaker
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 6525 GA Nijmegen, The Netherlands; (D.R.); (J.v.D.); (W.T.); (G.V.)
| | - Jarek van Dijk
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 6525 GA Nijmegen, The Netherlands; (D.R.); (J.v.D.); (W.T.); (G.V.)
| | - Willem Titulaer
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 6525 GA Nijmegen, The Netherlands; (D.R.); (J.v.D.); (W.T.); (G.V.)
| | | | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 6525 GA Nijmegen, The Netherlands; (D.R.); (J.v.D.); (W.T.); (G.V.)
- Baco Institute of Protein Science (BIPS), Mindoro 5201, Philippines
| | - Li Xue
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 6525 GA Nijmegen, The Netherlands; (D.R.); (J.v.D.); (W.T.); (G.V.)
- Correspondence:
| |
Collapse
|
98
|
Genomic Analysis of Intrinsically Disordered Proteins in the Genus Camelus. Int J Mol Sci 2020; 21:ijms21114010. [PMID: 32503351 PMCID: PMC7312968 DOI: 10.3390/ijms21114010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/14/2020] [Accepted: 05/18/2020] [Indexed: 12/11/2022] Open
Abstract
Intrinsically disordered proteins/regions (IDPs/IDRs) fail to fold completely into 3D structures, but have major roles in determining protein function. While natively disordered proteins/regions have been found to fulfill a wide variety of primary cellular roles, the functions of many disordered proteins in numerous species remain to be uncovered. Here, we perform the first large-scale study of IDPs/IDRs in the genus Camelus, one of the most important mammalians in Asia and North Africa, in order to explore the biological roles of these proteins. The study includes the prediction of disordered proteins/regions in Camelus species and in humans using multiple state-of-the-art prediction tools. Additionally, we provide a comparative analysis of Camelus and Homo sapiens IDPs/IDRs for the sake of highlighting the distinctive use of disorder in each genus. Our findings indicate that the human proteome is more disordered than the Camelus proteome. Gene Ontology analysis also revealed that Camelus IDPs are enriched in glutathione catabolism and lactose biosynthesis.
Collapse
|
99
|
Langenberg T, Gallardo R, van der Kant R, Louros N, Michiels E, Duran-Romaña R, Houben B, Cassio R, Wilkinson H, Garcia T, Ulens C, Van Durme J, Rousseau F, Schymkowitz J. Thermodynamic and Evolutionary Coupling between the Native and Amyloid State of Globular Proteins. Cell Rep 2020; 31:107512. [PMID: 32294448 PMCID: PMC7175379 DOI: 10.1016/j.celrep.2020.03.076] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 01/12/2020] [Accepted: 03/23/2020] [Indexed: 11/19/2022] Open
Abstract
The amyloid-like aggregation propensity present in most globular proteins is generally considered to be a secondary side effect resulting from the requirements of protein stability. Here, we demonstrate, however, that mutations in the globular and amyloid state are thermodynamically correlated rather than simply associated. In addition, we show that the standard genetic code couples this structural correlation into a tight evolutionary relationship. We illustrate the extent of this evolutionary entanglement of amyloid propensity and globular protein stability. Suppressing a 600-Ma-conserved amyloidogenic segment in the p53 core domain fold is structurally feasible but requires 7-bp substitutions to concomitantly introduce two aggregation-suppressing and three stabilizing amino acid mutations. We speculate that, rather than being a corollary of protein evolution, it is equally plausible that positive selection for amyloid structure could have been a driver for the emergence of globular protein structure.
Collapse
Affiliation(s)
- Tobias Langenberg
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Rodrigo Gallardo
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Rob van der Kant
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Nikolaos Louros
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Emiel Michiels
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Ramon Duran-Romaña
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Bert Houben
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Rafaela Cassio
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Hannah Wilkinson
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Teresa Garcia
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Chris Ulens
- Laboratory of Structural Neurobiology, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Joost Van Durme
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Frederic Rousseau
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
| | - Joost Schymkowitz
- Switch Laboratory, VIB Center for Brain and Disease Research, Herestraat 49, 3000 Leuven, Belgium; Switch Laboratory, Department of Cellular and Molecular Medicine, KU Leuven, Herestraat 49, 3000 Leuven, Belgium.
| |
Collapse
|
100
|
Paladin L, Schaeffer M, Gaudet P, Zahn-Zabal M, Michel PA, Piovesan D, Tosatto SCE, Bairoch A. The Feature-Viewer: a visualization tool for positional annotations on a sequence. Bioinformatics 2020; 36:3244-3245. [DOI: 10.1093/bioinformatics/btaa055] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 01/02/2020] [Accepted: 01/20/2020] [Indexed: 01/15/2023] Open
Abstract
Abstract
Summary
The Feature-Viewer is a lightweight library for the visualization of biological data mapped to a protein or nucleotide sequence. It is designed for ease of use while allowing for a full customization. The library is already used by several biological data resources and allows intuitive visual mapping of a full spectra of sequence features for different usages.
Availability and implementation
The Feature-Viewer is open source, compatible with state-of-the-art development technologies and responsive, also for mobile viewing. Documentation and usage examples are available online.
Collapse
Affiliation(s)
- Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, Padova 35121, Italy
| | - Mathieu Schaeffer
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Pascale Gaudet
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Monique Zahn-Zabal
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Pierre-André Michel
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padova 35121, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padova 35121, Italy
- CNR Institute of Neuroscience, Padova 35121, Italy
| | - Amos Bairoch
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| |
Collapse
|