1
|
Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024; 173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Alkaliphilic proteins have great potential as biocatalysts in biotechnology, especially for enzyme engineering. Extensive research has focused on exploring the enzymatic potential of alkaliphiles and characterizing alkaliphilic proteins. However, the current method employed for identifying these proteins that requires web lab experiment is time-consuming, labor-intensive, and expensive. Therefore, the development of a computational method for alkaliphilic protein identification would be invaluable for protein engineering and design. In this study, we present a novel approach that uses embeddings from a protein language model called ESM-2(3B) in a deep learning framework to classify alkaliphilic and non-alkaliphilic proteins. To our knowledge, this is the first attempt to employ embeddings from a pre-trained protein language model to classify alkaliphilic protein. A reliable dataset comprising 1,002 alkaliphilic and 1,866 non-alkaliphilic proteins was constructed for training and testing the proposed model. The proposed model, dubbed ALPACA, achieves performance scores of 0.88, 0.84, and 0.75 for accuracy, f1-score, and Matthew correlation coefficient respectively on independent dataset. ALPACA is likely to serve as a valuable resource for exploring protein alkalinity and its role in protein design and engineering.
Collapse
|
2
|
Relation between flexibility and intrinsically disorder regions in thermosensitive TRP channels reveal allosteric effects. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:77-90. [PMID: 37777680 DOI: 10.1007/s00249-023-01682-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/06/2023] [Accepted: 08/20/2023] [Indexed: 10/02/2023]
Abstract
How a protein propagates the conformational changes throughout its structure remains largely unknown. In thermosensitive TRP channels, this allosteric communication is triggered by ligand interaction or in response to temperature changes. Because dynamic allostery suggests a dynamic role of disordered regions, in this work we set out to thoroughly evaluate these regions in six thermosensitive TRP channels. Thus, by contrasting the intrinsic flexibility of the transmembrane region as a function of the degree of disorder in those proteins, we discovered several residues that do not show a direct correlation in both parameters. This kind of structural discrepancy revealed residues that are either reported to be dynamic, functionally relevant or are involved in signal propagation and probably part of allosteric networks. These discrepant, potentially dynamic regions are not exclusive of TRP channels, as this same correlation was found in the Kv Shaker channel.
Collapse
|
3
|
Expanding the Landscape of Amyloid Sequences with CARs-DB: A Database of Polar Amyloidogenic Peptides from Disordered Proteins. Methods Mol Biol 2024; 2714:171-185. [PMID: 37676599 DOI: 10.1007/978-1-0716-3441-7_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Several databases collecting amyloidogenic regions have been released to provide information on protein sequences able to form amyloid fibrils. However, most of these resources are built with data from experiments that detect highly hydrophobic stretches located within transiently exposed protein segments. We recently demonstrated that cryptic amyloidogenic regions (CARs) of polar nature have the potential to form amyloid fibrils in vitro. Given the underrepresentation of these types of sequences in current amyloid databases, we developed CARs-DB, the first repository that collects thousands of predicted CARs from intrinsically disordered regions. This protocol chapter describes how to use CARs-DB to search for sequences of interest that might be connected to disease or functional protein-protein interactions. In addition, we provide study cases to illustrate the database's features to users. The CARs-DB is readily accessible at http://carsdb.ppmclab.com/ .
Collapse
|
4
|
Overlaps Between CDS Regions of Protein-Coding Genes in the Human Genome: A Case Study on the NR1D1-THRA Gene Pair. J Mol Evol 2023; 91:963-975. [PMID: 38006429 DOI: 10.1007/s00239-023-10147-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 11/12/2023] [Indexed: 11/27/2023]
Abstract
For several decades, it has been known that a substantial number of genes within human DNA exhibit overlap; however, the biological and evolutionary significance of these overlaps remain poorly understood. This study focused on investigating specific instances of overlap where the overlapping DNA region encompasses the coding DNA sequences (CDSs) of protein-coding genes. The results revealed that proteins encoded by overlapping CDSs exhibit greater disorder than those from nonoverlapping CDSs. Additionally, these DNA regions were identified as GC-rich. This could be partially attributed to the absence of stop codons from two distinct reading frames rather than one. Furthermore, these regions were found to harbour fewer single-nucleotide polymorphism (SNP) sites, possibly due to constraints arising from the overlapping state where mutations could affect two genes simultaneously.While elucidating these properties, the NR1D1-THRA gene pair emerged as an exceptional case with highly structured proteins and a distinctly conserved sequence across eutherian mammals. Both NR1D1 and THRA are nuclear receptors lacking a ligand-binding domain at their C-terminus, which is the region where these gene pairs overlap. The NR1D1 gene is involved in the regulation of circadian rhythm, while the THRA gene encodes a thyroid hormone receptor, and both play crucial roles in various physiological processes. This study suggests that, in addition to their well-established functions, the specifically overlapping CDS regions of these genes may encode protein segments with additional, yet undiscovered, biological roles.
Collapse
|
5
|
Disorder and amino acid composition in proteins: their potential role in the adaptation of extracellular pilins to the acidic media, where Acidithiobacillus thiooxidans grows. Extremophiles 2023; 27:31. [PMID: 37848738 DOI: 10.1007/s00792-023-01317-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/26/2023] [Indexed: 10/19/2023]
Abstract
There are few biophysical studies or structural characterizations of the type IV pilin system of extremophile bacteria, such as the acidophilic Acidithiobacillus thiooxidans. We set out to analyze their pili-comprising proteins, pilins, because these extracellular proteins are in constant interaction with protons of the acidic medium in which At. thiooxidans grows. We used the web server Operon Mapper to analyze and identify the cluster codified by the minor pilin of At. thiooxidans. In addition, we carried an in-silico characterization of such pilins using the VL-XT algorithm of PONDR® server. Our results showed that structural disorder prevails more in pilins of At. thiooxidans than in non-acidophilic bacteria. Further computational characterization showed that the pilins of At. thiooxidans are significantly enriched in hydroxy (serine and threonine) and amide (glutamine and asparagine) residues, and significantly reduced in charged residues (aspartic acid, glutamic acid, arginine and lysine). Similar results were obtained when comparing pilins from other Acidithiobacillus and other acidophilic bacteria from another genus versus neutrophilic bacteria, suggesting that these properties are intrinsic to pilins from acidic environments, most likely by maintaining solubility and stability in harsh conditions. These results give guidelines for the application of extracellular proteins of acidophiles in protein engineering.
Collapse
|
6
|
Solution structure and behaviour of the Arabidopsis thaliana HYL1 protein. Biochim Biophys Acta Gen Subj 2023; 1867:130376. [PMID: 37150226 DOI: 10.1016/j.bbagen.2023.130376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/14/2023] [Accepted: 05/02/2023] [Indexed: 05/09/2023]
Abstract
In plants, microRNA biogenesis involves the complex assembly of molecular processes that are mostly governed by three proteins: RNase III protein DCL1 and two RNA binding proteins, SERRATE and HYL1. HYL1 protein is a double stranded RNA binding protein that is needed for the precise excision of miRNA/miRNA* duplex from the stem-loop containing primary miRNA gene transcripts. Moreover, HYL1 protein partners with HSP90 and CARP9 proteins to load the miRNA molecules onto the AGO1 endonuclease. HYL1 protein as a crucial player in the biogenesis pathway is regulated by its phosphorylation status to fine tune the levels of miRNA in various physiological conditions. HYL1 protein consists of two dsRNA binding domains (dsRBD) that are involved in RNA binding and dimerization and a C-terminal disordered tail of unknown function. Although the spatial structures of the individual dsRBDs have been determined there is a lack of information about the behaviour and structure of the full length protein. Using small the angle X-ray scattering (SAXS) technique we investigated the structure and dynamic of the HYL1 protein from Arabidopsis thaliana in solution. We show that the C-terminal domain is disordered and dynamic in solution and that HYL1 protein dimerization is dependent on the concentration. HYL1 protein lacking a C-terminal tail and a nuclear localisation signal (NLS) fragment is almost exclusively monomeric and similarly to full-length protein has a dynamic nature in solution. Our results point for the first time to the role of the C-terminal fragment in stabilisation of HYL1 dimer formation.
Collapse
|
7
|
Vital for Viruses: Intrinsically Disordered Proteins. J Mol Biol 2023; 435:167860. [PMID: 37330280 PMCID: PMC10656058 DOI: 10.1016/j.jmb.2022.167860] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 06/19/2023]
Abstract
Viruses infect all kingdoms of life; their genomes vary from DNA to RNA and in size from 2kB to 1 MB or more. Viruses frequently employ disordered proteins, that is, protein products of virus genes that do not themselves fold into independent three-dimensional structures, but rather, constitute a versatile molecular toolkit to accomplish a range of functions necessary for viral infection, assembly, and proliferation. Interestingly, disordered proteins have been discovered in almost all viruses so far studied, whether the viral genome consists of DNA or RNA, and whatever the configuration of the viral capsid or other outer covering. In this review, I present a wide-ranging set of stories illustrating the range of functions of IDPs in viruses. The field is rapidly expanding, and I have not tried to include everything. What is included is meant to be a survey of the variety of tasks that viruses accomplish using disordered proteins.
Collapse
|
8
|
Alternative splicing level related to intron size and organism complexity. BMC Genomics 2021; 22:853. [PMID: 34819032 PMCID: PMC8614042 DOI: 10.1186/s12864-021-08172-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 11/12/2021] [Indexed: 12/25/2022] Open
Abstract
Background Alternative splicing is the process of selecting different combinations of splice sites to produce variably spliced mRNAs. However, the relationships between alternative splicing prevalence and level (ASP/L) and variations of intron size and organism complexity (OC) remain vague. Here, we developed a robust protocol to analyze the relationships between ASP/L and variations of intron size and OC. Approximately 8 Tb raw RNA-Seq data from 37 eumetazoan species were divided into three sets of species based on variations in intron size and OC. Results We found a strong positive correlation between ASP/L and OC, but no correlation between ASP/L and intron size across species. Surprisingly, ASP/L displayed a positive correlation with mean intron size of genes within individual genomes. Moreover, our results revealed that four ASP/L-related pathways contributed to the differences in ASP/L that were associated with OC. In particular, the spliceosome pathway displayed distinct genomic features, such as the highest gene expression level, conservation level, and fraction of disordered regions. Interestingly, lower or no obvious correlations were observed among these genomic features. Conclusions The positive correlation between ASP/L and OC ubiquitously exists in eukaryotes, and this correlation is not affected by the mean intron size of these species. ASP/L-related splicing factors may play an important role in the evolution of OC. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08172-2.
Collapse
|
9
|
DeepREx-WS: A web server for characterising protein-solvent interaction starting from sequence. Comput Struct Biotechnol J 2021; 19:5791-5799. [PMID: 34765094 PMCID: PMC8566768 DOI: 10.1016/j.csbj.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open
Abstract
Protein–solvent interaction provides important features for protein surface engineering when the structure is absent or partially solved. Presently, we can integrate the notion of solvent exposed/buried residues with that of their flexibility and intrinsic disorder to highlight regions where mutations may increase or decrease protein stability in order to modify proteins for biotechnological reasons, while preserving their functional integrity. Here we describe a web server, which provides the unique possibility of integrating knowledge of solvent and non-solvent exposure with that of residue conservation, flexibility and disorder of a protein sequence, for a better understanding of which regions are relevant for protein integrity. The core of the webserver is DeepREx, a novel deep learning-based tool that classifies each residue in the sequence as buried or exposed. DeepREx is trained on a high-quality, non-redundant dataset derived from the Protein Data Bank comprising 2332 monomeric protein chains and benchmarked on a blind test set including 200 protein sequences unrelated with the training set. Results show that DeepREx performs at the state-of-the-art in the field. In turn, the Web Server, DeepREx-WS, supplements the predictions of DeepREx with features that allow a better characterisation of exposed and buried regions: i) residue conservation derived from multiple sequence alignment; ii) local sequence hydrophobicity; iii) residue flexibility computed with MEDUSA; iv) a predictor of secondary structure; v) the presence of disordered regions as derived from MobiDB-Lite3.0. The web server allows browsing, selecting and intersecting the different features. We demonstrate a possible application of the DeepREx-WS for assisting the identification of residues to be variated in protein surface engineering processes.
Collapse
|
10
|
Role of "dual-personality" fragments in HEV adaptation-analysis of Y-domain region. J Genet Eng Biotechnol 2021; 19:154. [PMID: 34637041 PMCID: PMC8511232 DOI: 10.1186/s43141-021-00238-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/30/2021] [Indexed: 01/06/2023]
Abstract
BACKGROUND Hepatitis E is a liver disease caused by the pathogen hepatitis E virus (HEV). The largest polyprotein open reading frame 1 (ORF1) contains a nonstructural Y-domain region (YDR) whose activity in HEV adaptation remains uncharted. The specific role of disordered regions in several nonstructural proteins has been demonstrated to participate in the multiplication and multiple regulatory functions of the viruses. Thus, intrinsic disorder of YDR including its structural and functional annotation was comprehensively studied by exploiting computational methodologies to delineate its role in viral adaptation. RESULTS Based on our findings, it was evident that YDR contains significantly higher levels of ordered regions with less prevalence of disordered residues. Sequence-based analysis of YDR revealed it as a "dual personality" (DP) protein due to the presence of both structured and unstructured (intrinsically disordered) regions. The evolution of YDR was shaped by pressures that lead towards predominance of both disordered and regularly folded amino acids (Ala, Arg, Gly, Ile, Leu, Phe, Pro, Ser, Tyr, Val). Additionally, the predominance of characteristic DP residues (Thr, Arg, Gly, and Pro) further showed the order as well as disorder characteristic possessed by YDR. The intrinsic disorder propensity analysis of YDR revealed it as a moderately disordered protein. All the YDR sequences consisted of molecular recognition features (MoRFs), i.e., intrinsic disorder-based protein-protein interaction (PPI) sites, in addition to several nucleotide-binding sites. Thus, the presence of molecular recognition (PPI, RNA binding, and DNA binding) signifies the YDR's interaction with specific partners, host membranes leading to further viral infection. The presence of various disordered-based phosphorylation sites further signifies the role of YDR in various biological processes. Furthermore, functional annotation of YDR revealed it as a multifunctional-associated protein, due to its susceptibility in binding to a wide range of ligands and involvement in various catalytic activities. CONCLUSIONS As DP are targets for regulation, thus, YDR contributes to cellular signaling processes through PPIs. As YDR is incompletely understood, therefore, our data on disorder-based function could help in better understanding its associated functions. Collectively, our novel data from this comprehensive investigation is the first attempt to delineate YDR role in the regulation and pathogenesis of HEV.
Collapse
|
11
|
Cryptic amyloidogenic regions in intrinsically disordered proteins: Function and disease association. Comput Struct Biotechnol J 2021; 19:4192-4206. [PMID: 34527192 PMCID: PMC8349759 DOI: 10.1016/j.csbj.2021.07.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/23/2021] [Accepted: 07/23/2021] [Indexed: 11/21/2022] Open
Abstract
The amyloid conformation is considered a fundamental state of proteins and the propensity to populate it a generic property of polypeptides. Multiple proteome-wide analyses addressed the presence of amyloidogenic regions in proteins, nurturing our understanding of their nature and biological implications. However, these analyses focused on highly aggregation-prone and hydrophobic stretches that are only marginally found in intrinsically disordered regions (IDRs). Here, we explore the prevalence of cryptic amyloidogenic regions (CARs) of polar nature in IDRs. CARs are widespread in IDRs and associated with IDPs function, with particular involvement in protein–protein interactions, but their presence is also connected to a risk of malfunction. By exploring this function/malfunction dichotomy, we speculate that ancestral CARs might have evolved into functional interacting regions playing a significant role in protein evolution at the origins of life.
Collapse
Key Words
- APR, Aggregation-prone region
- Aggregation
- Amyloid
- CARs, Cryptic amyloidogenic regions
- CD, Circular dichroism
- CR, Congo red
- Evolution
- FTIR, Fourier transform infrared
- IDPs, Intrinsically disordered proteins
- IDRs, Intrinsically disordered regions
- Intrinsically disordered proteins
- PBS, Phosphate buffer saline
- PPI, Protein-protein interactions
- Protein disorder
- Protein–protein interactions
- Rb, Retinoblastoma associated proteins
- RbC, Core region of Rb
- TEM, Transmission electron microscopy
- Th-T, Thioflavin-T
Collapse
|
12
|
Variations in Orf3a protein of SARS-CoV-2 alter its structure and function. Biochem Biophys Rep 2021; 26:100933. [PMID: 33527091 PMCID: PMC7839395 DOI: 10.1016/j.bbrep.2021.100933] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 01/18/2021] [Accepted: 01/22/2021] [Indexed: 12/11/2022] Open
Abstract
Severe acquired respiratory syndrome coronavirus 2 (SARS-CoV-2) rapidly spread worldwide and acquired multiple mutations in its genome. Orf3a, an accessory protein encoded by the genome of SARS-CoV-2, plays a significant role in viral infection and pathogenesis. In the present in-silico study, 15,928 sequences of Orf3a reported worldwide were compared to identify variations in this protein. Our analysis revealed the occurrence of mutations at 173 residues of Orf3a protein. Subsequently, protein modelling was performed that revealed twelve mutations which can considerably affect the stability of Orf3a. Among the 12 mutations, three mutations (Y160H, D210Y and S171L) also lead to alterations in secondary structure and protein disorder parameters of the Orf3a protein. Further, we used predictive tools to identify five promising epitopes of B-cells, which resides in the mutated regions of Orf3a. Altogether, our study sheds light on the variations occurring in Orf3a that might contribute to alteration in protein structure and function.
Collapse
|
13
|
Insight into membraneless organelles and their associated proteins: Drivers, Clients and Regulators. Comput Struct Biotechnol J 2021; 19:3964-3977. [PMID: 34377363 PMCID: PMC8318826 DOI: 10.1016/j.csbj.2021.06.042] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 06/26/2021] [Accepted: 06/27/2021] [Indexed: 02/06/2023] Open
Abstract
In recent years, attention has been devoted to proteins forming immiscible liquid phases within the liquid intracellular medium, commonly referred to as membraneless organelles (MLO). These organelles enable the spatiotemporal associations of cellular components that exchange dynamically with the cellular milieu. The dysregulation of these liquid-liquid phase separation processes (LLPS) may cause various diseases including neurodegenerative pathologies and cancer, among others. Until very recently, databases containing information on proteins forming MLOs, as well as tools and resources facilitating their analysis, were missing. This has recently changed with the publication of 4 databases that focus on different types of experiments, sets of proteins, inclusion criteria, and levels of annotation or curation. In this study we integrate and analyze the information across these databases, complement their records, and produce a consolidated set of proteins that enables the investigation of the LLPS phenomenon. To gain insight into the features that characterize different types of MLOs and the roles of their associated proteins, they were grouped into categories: High Confidence MLO associated (including Drivers and reviewed proteins), Potential Clients and Regulators, according to their annotated functions. We show that none of the databases taken alone covers the data sufficiently to enable meaningful analysis, validating our integration effort as essential for gaining better understanding of phase separation and laying the foundations for the discovery of new proteins potentially involved in this important cellular process. Lastly, we developed a server, enabling customized selections of different sets of proteins based on MLO location, database, disorder content, among other attributes (https://forti.shinyapps.io/mlos/).
Collapse
|
14
|
Predicting substitutions to modulate disorder and stability in coiled-coils. BMC Bioinformatics 2020; 21:573. [PMID: 33349244 PMCID: PMC7751101 DOI: 10.1186/s12859-020-03867-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/20/2022] Open
Abstract
Background Coiled-coils are described as stable structural motifs, where two or more helices wind around each other. However, coiled-coils are associated with local mobility and intrinsic disorder. Intrinsically disordered regions in proteins are characterized by lack of stable secondary and tertiary structure under physiological conditions in vitro. They are increasingly recognized as important for protein function. However, characterizing their behaviour in solution and determining precisely the extent of disorder of a protein region remains challenging, both experimentally and computationally. Results In this work, we propose a computational framework to quantify the extent of disorder within a coiled-coil in solution and to help design substitutions modulating such disorder. Our method relies on the analysis of conformational ensembles generated by relatively short all-atom Molecular Dynamics (MD) simulations. We apply it to the phosphoprotein multimerisation domains (PMD) of Measles virus (MeV) and Nipah virus (NiV), both forming tetrameric left-handed coiled-coils. We show that our method can help quantify the extent of disorder of the C-terminus region of MeV and NiV PMDs from MD simulations of a few tens of nanoseconds, and without requiring an extensive exploration of the conformational space. Moreover, this study provided a conceptual framework for the rational design of substitutions aimed at modulating the stability of the coiled-coils. By assessing the impact of four substitutions known to destabilize coiled-coils, we derive a set of rules to control MeV PMD structural stability and cohesiveness. We therefore design two contrasting substitutions, one increasing the stability of the tetramer and the other increasing its flexibility. Conclusions Our method can be considered as a platform to reason about how to design substitutions aimed at regulating flexibility and stability.
Collapse
|
15
|
Splicing-accessible coding 3'UTRs control protein stability and interaction networks. Genome Biol 2020; 21:186. [PMID: 32727563 PMCID: PMC7392665 DOI: 10.1186/s13059-020-02102-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Accepted: 07/14/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND 3'-Untranslated regions (3'UTRs) play crucial roles in mRNA metabolism, such as by controlling mRNA stability, translation efficiency, and localization. Intriguingly, in some genes the 3'UTR is longer than their coding regions, pointing to additional, unknown functions. Here, we describe a protein-coding function of 3'UTRs upon frameshift-inducing alternative splicing in more than 10% of human and mouse protein-coding genes. RESULTS 3'UTR-encoded amino acid sequences show an enrichment of PxxP motifs and lead to interactome rewiring. Furthermore, an elevated proline content increases protein disorder and reduces protein stability, thus allowing splicing-controlled regulation of protein half-life. This could also act as a surveillance mechanism for erroneous skipping of penultimate exons resulting in transcripts that escape nonsense mediated decay. The impact of frameshift-inducing alternative splicing on disease development is emphasized by a retinitis pigmentosa-causing mutation leading to translation of a 3'UTR-encoded, proline-rich, destabilized frameshift-protein with altered protein-protein interactions. CONCLUSIONS We describe a widespread, evolutionarily conserved mechanism that enriches the mammalian proteome, controls protein expression and protein-protein interactions, and has important implications for the discovery of novel, potentially disease-relevant protein variants.
Collapse
|
16
|
Disorder for Dummies: Functional Mutagenesis of Transient Helical Segments in Disordered Proteins. Methods Mol Biol 2020. [PMID: 32696350 DOI: 10.1007/978-1-0716-0524-0_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Most cytosolic eukaryotic proteins contain a mixture of ordered and disordered regions. Disordered regions facilitate cell signaling by concentrating sites for posttranslational modifications and protein-protein interactions into arrays of short linear motifs that can be reorganized by RNA splicing. The evolution of disordered regions looks different from their ordered counterparts. In some cases, selection is focused on maintaining protein binding interfaces and PTM sites, but sequence heterogeneity is common. In other cases, simple properties like charge, length, or end-to-end distance are maintained. Many disordered protein binding sites contain some transient secondary structure that may resemble the structure of the bound state. α-Helical secondary structure is common and a wide range of fractional helicity is observed in different disordered regions. Here we provide a simple protocol to identify transient helical segments and design mutants that can change their structure and function.
Collapse
|
17
|
Data set of intrinsically disordered proteins analysed at a local protein conformation level. Data Brief 2020; 29:105383. [PMID: 32195305 PMCID: PMC7078294 DOI: 10.1016/j.dib.2020.105383] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 02/24/2020] [Accepted: 02/27/2020] [Indexed: 10/26/2022] Open
Abstract
Intrinsic Disorder Proteins (IDPs) have become a hot topic since their characterisation in the 90s. The data presented in this article are related to our research entitled "A structural entropy index to analyse local conformations in Intrinsically Disordered Proteins" published in Journal of Structural Biology [1]. In this study, we quantified, for the first time, continuum from rigidity to flexibility and finally disorder. Non-disordered regions were also highlighted in the ensemble of disordered proteins. This work was done using the Protein Ensemble Database (PED), which is a useful database collecting series of protein structures considered as IDPs. The data set consists of a collection of cleaned protein files in classical pdb format that can be readily used as an input with most automatic analysis software. The accompanying data include the coding of all structural information in terms of a structural alphabet, namely Protein Blocks (PBs). An entropy index derived from PBs that allows apprehending the continuum between protein rigidity to flexibility to disorder is included, with information from secondary structure assignment, protein accessibility and prediction of disorder from the sequences. The data may be used for further structural bioinformatics studies of IDPs. It can also be used as a benchmark for evaluating disorder prediction methods.
Collapse
|
18
|
In silico prediction of structural changes in human papillomavirus type 16 (HPV16) E6 oncoprotein and its variants. BMC Mol Cell Biol 2019; 20:35. [PMID: 31426742 PMCID: PMC6700771 DOI: 10.1186/s12860-019-0217-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 08/08/2019] [Indexed: 12/11/2022] Open
Abstract
Background HPV16 infection is one of the main risk factors involved in the development of cervical cancer, mainly due to the high oncogenic potential of the viral proteins E6 and E7, which are involved in the different processes of malignant transformation. There is a broad spectrum of intratypical variation of E6, which is reflected in its high diversity, biological behavior, global distribution and risk of causing cervical cancer. Experimental studies have shown that the intratypical variants of the protein E6 from the European variants (E-G350, E-A176/G350, E-C188/G350) and Asian-American variants (AAa and AAc), are capable of inducing the differential expression of genes involved in the development of cervical cancer. Results An in silico analysis was performed to characterize the molecular effects of these variations using the structure of the HPV16 E6 oncoprotein (PDB: 4XR8; chain H) as a template. In particular, we evaluated the 3D structures of the intratypical variants by structural alignment, ERRAT, Ramachandran plots and prediction of protein disorder, which was further validated by molecular dynamics simulations. Our results, in general, showed no significant changes in the protein 3D structure. However, we observed subtle changes in protein physicochemical features and structural disorder in the N- and C-termini. Conclusions Our results showed that mutations in the viral oncogene E6 of six high-risk HPV16 variants are effectively neutral and do not cause significant structural changes except slight variations of structural disorder. As structural disorder is involved in rewiring protein-protein interactions, these results suggest a differential pattern of interaction of E6 with the target protein P53 and possibly different patterns of tumor aggressiveness associated with certain types of variants of the E6 oncoprotein. Electronic supplementary material The online version of this article (10.1186/s12860-019-0217-0) contains supplementary material, which is available to authorized users.
Collapse
|
19
|
Disorder Atlas: Web-based software for the proteome-based interpretation of intrinsic disorder predictions. Comput Biol Chem 2019; 83:107090. [PMID: 31326853 DOI: 10.1016/j.compbiolchem.2019.107090] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 12/05/2018] [Accepted: 07/10/2019] [Indexed: 11/18/2022]
Abstract
Intrinsically disordered proteins lack a stable three-dimensional structure under physiological conditions. While this property has gained considerable interest within the past two decades, disorder poses substantial challenges to experimental characterization efforts. In effect, numerous computational tools have been developed to predict disorder from primary sequences, however, interpreting the output of these algorithms remains a challenge. To begin to bridge this gap, we present Disorder Atlas, web-based software that facilitates the interpretation of intrinsic disorder predictions using proteome-based descriptive statistics. This service is also equipped to facilitate large-scale systematic exploratory searches for proteins encompassing disorder features of interest, and further allows users to browse the prevalence of multiple disorder features at the proteome level. As a result, Disorder Atlas provides a user-friendly tool that places algorithm-generated disorder predictions in the context of the proteome, thereby providing an instrument to compare the results of a query protein against predictions made for an entire population. Disorder Atlas currently supports ten eukaryotic proteomes and is freely available for non-commercial users at http://www.disorderatlas.org.
Collapse
|
20
|
The molecular pathogenesis of superoxide dismutase 1-linked ALS is promoted by low oxygen tension. Acta Neuropathol 2019; 138:85-101. [PMID: 30863976 PMCID: PMC6570705 DOI: 10.1007/s00401-019-01986-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 02/25/2019] [Accepted: 03/01/2019] [Indexed: 12/13/2022]
Abstract
Mutations in superoxide dismutase 1 (SOD1) cause amyotrophic lateral sclerosis (ALS). Disease pathogenesis is linked to destabilization, disorder and aggregation of the SOD1 protein. However, the non-genetic factors that promote disorder and the subsequent aggregation of SOD1 have not been studied. Mainly located to the reducing cytosol, mature SOD1 contains an oxidized disulfide bond that is important for its stability. Since O2 is required for formation of the bond, we reasoned that low O2 tension might be a risk factor for the pathological changes associated with ALS development. By combining biochemical approaches in an extensive range of genetically distinct patient-derived cell lines, we show that the disulfide bond is an Achilles heel of the SOD1 protein. Culture of patient-derived fibroblasts, astrocytes, and induced pluripotent stem cell-derived mixed motor neuron and astrocyte cultures (MNACs) under low O2 tensions caused reductive bond cleavage and increases in disordered SOD1. The effects were greatest in cells derived from patients carrying ALS-linked mutations in SOD1. However, significant increases also occurred in wild-type SOD1 in cultures derived from non-disease controls, and patients carrying mutations in other common ALS-linked genes. Compared to fibroblasts, MNACs showed far greater increases in SOD1 disorder and even aggregation of mutant SOD1s, in line with the vulnerability of the motor system to SOD1-mediated neurotoxicity. Our results show for the first time that O2 tension is a principal determinant of SOD1 stability in human patient-derived cells. Furthermore, we provide a mechanism by which non-genetic risk factors for ALS, such as aging and other conditions causing reduced vascular perfusion, could promote disease initiation and progression.
Collapse
|
21
|
SLiM-Enrich: computational assessment of protein-protein interaction data as a source of domain-motif interactions. PeerJ 2018; 6:e5858. [PMID: 30402352 PMCID: PMC6215436 DOI: 10.7717/peerj.5858] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 10/02/2018] [Indexed: 01/21/2023] Open
Abstract
Many important cellular processes involve protein–protein interactions (PPIs) mediated by a Short Linear Motif (SLiM) in one protein interacting with a globular domain in another. Despite their significance, these domain-motif interactions (DMIs) are typically low affinity, which makes them challenging to identify by classical experimental approaches, such as affinity pulldown mass spectrometry (AP-MS) and yeast two-hybrid (Y2H). DMIs are generally underrepresented in PPI networks as a result. A number of computational methods now exist to predict SLiMs and/or DMIs from experimental interaction data but it is yet to be established how effective different PPI detection methods are for capturing these low affinity SLiM-mediated interactions. Here, we introduce a new computational pipeline (SLiM-Enrich) to assess how well a given source of PPI data captures DMIs and thus, by inference, how useful that data should be for SLiM discovery. SLiM-Enrich interrogates a PPI network for pairs of interacting proteins in which the first protein is known or predicted to interact with the second protein via a DMI. Permutation tests compare the number of known/predicted DMIs to the expected distribution if the two sets of proteins are randomly associated. This provides an estimate of DMI enrichment within the data and the false positive rate for individual DMIs. As a case study, we detect significant DMI enrichment in a high-throughput Y2H human PPI study. SLiM-Enrich analysis supports Y2H data as a source of DMIs and highlights the high false positive rates associated with naïve DMI prediction. SLiM-Enrich is available as an R Shiny app. The code is open source and available via a GNU GPL v3 license at: https://github.com/slimsuite/SLiMEnrich. A web server is available at: http://shiny.slimsuite.unsw.edu.au/SLiMEnrich/.
Collapse
|
22
|
Structure and sequence based functional annotation of Zika virus NS2b protein: Computational insights. Biochem Biophys Res Commun 2017; 492:659-667. [PMID: 28188791 DOI: 10.1016/j.bbrc.2017.02.035] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 01/23/2017] [Accepted: 02/06/2017] [Indexed: 02/06/2023]
Abstract
While Zika virus (ZIKV) outbreaks are a growing concern for global health, a deep understanding about the virus is lacking. Here we report a contribution to the basic science on the virus- a detailed computational analysis of the non structural protein NS2b. This protein acts as a cofactor for the NS3 protease (NS3Pro) domain that is important on the viral life cycle, and is an interesting target for drug development. We found that ZIKV NS2b cofactor is highly similar to other virus within the Flavivirus genus, especially to West Nile Virus, suggesting that it is completely necessary for the protease complex activity. Furthermore, the ZIKV NS2b has an important role to the function and stability of the ZIKV NS3 protease domain even when presents a low conservation score. In addition, ZIKV NS2b is mostly rigid, which could imply a non dynamic nature in substrate recognition. Finally, by performing a computational alanine scanning mutagenesis, we found that residues Gly 52 and Asp 83 in the NS2b could be important in substrate recognition.
Collapse
|
23
|
Elucidating evolutionary features and functional implications of orphan genes in Leishmania major. INFECTION GENETICS AND EVOLUTION 2015; 32:330-7. [PMID: 25843649 DOI: 10.1016/j.meegid.2015.03.031] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2015] [Revised: 03/25/2015] [Accepted: 03/26/2015] [Indexed: 11/28/2022]
Abstract
Orphan genes are protein coding genes that lack recognizable homologs in other organisms. These genes were reported to comprise a considerable fraction of coding regions in all sequenced genomes and thought to be allied with organism's lineage-specific traits. However, their evolutionary persistence and functional significance still remain elusive. Due to lack of homologs with the host genome and for their probable lineage-specific functional roles, orphan gene product of pathogenic protozoan might be considered as the possible therapeutic targets. Leishmania major is an important parasitic protozoan of the genus Leishmania that is associated with the disease cutaneous leishmaniasis. Therefore, evolutionary and functional characterization of orphan genes in this organism may help in understanding the factors prevailing pathogen evolution and parasitic adaptation. In this study, we systematically identified orphan genes of L. major and employed several in silico analyses for understanding their evolutionary and functional attributes. To trace the signatures of molecular evolution, we compared their evolutionary rate with non-orphan genes. In agreement with prior observations, here we noticed that orphan genes evolve at a higher rate as compared to non-orphan genes. Lower sequence conservation of orphan genes was previously attributed solely due to their younger gene age. However, here we observed that together with gene age, a number of genomic (like expression level, GC content, variation in codon usage) and proteomic factors (like protein length, intrinsic disorder content, hydropathicity) could independently modulate their evolutionary rate. We considered the interplay of all these factors and analyzed their relative contribution on protein evolutionary rate by regression analysis. On the functional level, we observed that orphan genes are associated with regulatory, growth factor and transport related processes. Moreover, these genes were found to be enriched with various types of interaction and trafficking motifs, implying their possible involvement in host-parasite interactions. Thus, our comprehensive analysis of L. major orphan genes provided evidence for their extensive roles in host-pathogen interactions and virulence.
Collapse
|
24
|
Association between intrinsic disorder and serine/threonine phosphorylation in Mycobacterium tuberculosis. PeerJ 2015; 3:e724. [PMID: 25648268 PMCID: PMC4304846 DOI: 10.7717/peerj.724] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 12/21/2014] [Indexed: 01/28/2023] Open
Abstract
Serine/threonine phosphorylation is an important mechanism that is involved in the regulation of protein function. In eukaryotes, phosphorylation occurs predominantly in intrinsically disordered regions of proteins. Though serine/threonine phosphorylation and protein disorder are much less prevalent in prokaryotes, some bacteria have high levels of serine/threonine phosphorylation and disorder, including the medically important M. tuberculosis. Here I show that serine/threonine phosphorylation sites in M. tuberculosis are highly enriched in intrinsically disordered regions, indicating similarity in the substrate recognition mechanisms of eukaryotic and M. tuberculosis kinases. Serine/threonine phosphorylation has been linked to the pathogenicity and survival of M. tuberculosis. Thus, a better understanding of how its kinases recognize their substrates could have important implications in understanding and controlling the biology of this deadly pathogen. These results also indicate that the association between serine/threonine phosphorylation and disorder is not a feature restricted to eukaryotes.
Collapse
|
25
|
Ubiquitin-independent proteasomal degradation. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2013; 1843:216-21. [PMID: 23684952 DOI: 10.1016/j.bbamcr.2013.05.008] [Citation(s) in RCA: 169] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 05/06/2013] [Accepted: 05/07/2013] [Indexed: 10/26/2022]
Abstract
Most proteasome substrates are marked for degradation by ubiquitin conjugation, but some are targeted by other means. The properties of these exceptional cases provide insights into the general requirements for proteasomal degradation. Here the focus is on three ubiquitin-independent substrates that have been the subject of detailed study. These are Rpn4, a transcriptional regulator of proteasome homeostasis, thymidylate synthase, an enzyme required for production of DNA precursors and ornithine decarboxylase, the initial enzyme committed to polyamine biosynthesis. It can be inferred from these cases that proteasome association and the presence of an unstructured region are the sole prerequisites for degradation. Based on that inference, artificial substrates have been designed to test the proteasome's capacity for substrate processing and its limitations. Ubiquitin-independent substrates may in some cases be a remnant of the pre-ubiquitome world, but in other cases could provide optimized regulatory solutions. This article is part of a Special Issue entitled: Ubiquitin-Proteasome System. Guest Editors: Thomas Sommer and Dieter H. Wolf.
Collapse
|