1
|
Jain A, Begum T, Ahmad S. Analysis and Prediction of Pathogen Nucleic Acid Specificity for Toll-like Receptors in Vertebrates. J Mol Biol 2023; 435:168208. [PMID: 37479078 DOI: 10.1016/j.jmb.2023.168208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/20/2023] [Accepted: 07/13/2023] [Indexed: 07/23/2023]
Abstract
Identification of key sequence, expression and function related features of nucleic acid-sensing host proteins is of fundamental importance to understand the dynamics of pathogen-specific host responses. To meet this objective, we considered toll-like receptors (TLRs), a representative class of membrane-bound sensor proteins, from 17 vertebrate species covering mammals, birds, reptiles, amphibians, and fishes in this comparative study. We identified the molecular signatures of host TLRs that are responsible for sensing pathogen nucleic acids or other pathogen-associated molecular patterns (PAMPs), and potentially play important roles in host defence mechanism. Interestingly, our findings reveal that such host-specific features are directly related to the strand (single or double) specificity of nucleic acid from pathogens. However, during host-pathogen interactions, such features were unable to explain the pathogenic PAMP (i.e., DNA, RNA or other) selectivity, suggesting a more complex mechanism. Using these features, we developed a number of machine learning models, of which Random Forest achieved a high performance (94.57% accuracy) to predict strand specificity of TLRs from protein-derived features. We applied the trained model to propose strand specificity of some previously uncharacterized distinct fish-specific novel TLRs (TLR18, TLR23, TLR24, TLR25, TLR27).
Collapse
Affiliation(s)
- Anuja Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India. https://twitter.com/@Anuja334
| | - Tina Begum
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
2
|
Li J, Li N, Roellig DM, Zhao W, Guo Y, Feng Y, Xiao L. High subtelomeric GC content in the genome of a zoonotic Cryptosporidium species. Microb Genom 2023; 9:mgen001052. [PMID: 37399068 PMCID: PMC10438818 DOI: 10.1099/mgen.0.001052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 05/24/2023] [Indexed: 07/04/2023] Open
Abstract
Cryptosporidium canis is a zoonotic species causing cryptosporidiosis in humans in addition to its natural hosts dogs and other fur animals. To understand the genetic basis for host adaptation, we sequenced the genomes of C. canis from dogs, minks, and foxes and conducted a comparative genomics analysis. While the genomes of C. canis have similar gene contents and organisations, they (~41.0 %) and C. felis (39.6 %) have GC content much higher than other Cryptosporidium spp. (24.3-32.9 %) sequenced to date. The high GC content is mostly restricted to subtelomeric regions of the eight chromosomes. Most of these GC-balanced genes encode Cryptosporidium-specific proteins that have intrinsically disordered regions and are involved in host-parasite interactions. Natural selection appears to play a more important role in the evolution of codon usage in GC-balanced C. canis, and most of the GC-balanced genes have undergone positive selection. While the identity in whole genome sequences between the mink- and dog-derived isolates is 99.9 % (9365 SNVs), it is only 96.0 % (362 894 SNVs) between them and the fox-derived isolate. In agreement with this, the fox-derived isolate possesses more subtelomeric genes encoding invasion-related protein families. Therefore, the change in subtelomeric GC content appears to be responsible for the more GC-balanced C. canis genomes, and the fox-derived isolate could represent a new Cryptosporidium species.
Collapse
Affiliation(s)
- Jiayu Li
- State Key Laboratory for Animal Disease Control and Prevention, South China Agricultural University, Guangzhou 510642, PR China
| | - Na Li
- State Key Laboratory for Animal Disease Control and Prevention, South China Agricultural University, Guangzhou 510642, PR China
| | - Dawn M. Roellig
- Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA
| | - Wentao Zhao
- State Key Laboratory for Animal Disease Control and Prevention, South China Agricultural University, Guangzhou 510642, PR China
| | - Yaqiong Guo
- State Key Laboratory for Animal Disease Control and Prevention, South China Agricultural University, Guangzhou 510642, PR China
| | - Yaoyu Feng
- State Key Laboratory for Animal Disease Control and Prevention, South China Agricultural University, Guangzhou 510642, PR China
| | - Lihua Xiao
- State Key Laboratory for Animal Disease Control and Prevention, South China Agricultural University, Guangzhou 510642, PR China
| |
Collapse
|
3
|
Biological soft matter: intrinsically disordered proteins in liquid-liquid phase separation and biomolecular condensates. Essays Biochem 2022; 66:831-847. [PMID: 36350034 DOI: 10.1042/ebc20220052] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 11/10/2022]
Abstract
The facts that many proteins with crucial biological functions do not have unique structures and that many biological processes are compartmentalized into the liquid-like biomolecular condensates, which are formed via liquid-liquid phase separation (LLPS) and are not surrounded by the membrane, are revolutionizing the modern biology. These phenomena are interlinked, as the presence of intrinsic disorder represents an important requirement for a protein to undergo LLPS that drives biogenesis of numerous membrane-less organelles (MLOs). Therefore, one can consider these phenomena as crucial constituents of a new IDP-LLPS-MLO field. Furthermore, intrinsically disordered proteins (IDPs), LLPS, and MLOs represent a clear link between molecular and cellular biology and soft matter and condensed soft matter physics. Both IDP and LLPS/MLO fields are undergoing explosive development and generate the ever-increasing mountain of crucial data. These new data provide answers to so many long-standing questions that it is difficult to imagine that in the very recent past, protein scientists and cellular biologists operated without taking these revolutionary concepts into account. The goal of this essay is not to deliver a comprehensive review of the IDP-LLPS-MLO field but to provide a brief and rather subjective outline of some of the recent developments in these exciting fields.
Collapse
|
4
|
Intrinsically Disordered Proteins: An Overview. Int J Mol Sci 2022; 23:ijms232214050. [PMID: 36430530 PMCID: PMC9693201 DOI: 10.3390/ijms232214050] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Many proteins and protein segments cannot attain a single stable three-dimensional structure under physiological conditions; instead, they adopt multiple interconverting conformational states. Such intrinsically disordered proteins or protein segments are highly abundant across proteomes, and are involved in various effector functions. This review focuses on different aspects of disordered proteins and disordered protein regions, which form the basis of the so-called "Disorder-function paradigm" of proteins. Additionally, various experimental approaches and computational tools used for characterizing disordered regions in proteins are discussed. Finally, the role of disordered proteins in diseases and their utility as potential drug targets are explored.
Collapse
|
5
|
Kurgan L. Resources for computational prediction of intrinsic disorder in proteins. Methods 2022; 204:132-141. [DOI: 10.1016/j.ymeth.2022.03.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 12/26/2022] Open
|
6
|
Thi Nhu Bui Q, Kim H, Wang H, Ki JS. Unveiling the genomic structures and evolutionary events of the saxitoxin biosynthetic gene sxtA in the marine toxic dinoflagellate Alexandrium. Mol Phylogenet Evol 2022; 168:107417. [PMID: 35031458 DOI: 10.1016/j.ympev.2022.107417] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 11/24/2021] [Accepted: 12/27/2021] [Indexed: 12/30/2022]
Abstract
Marine dinoflagellates Alexandriumare known to produce saxitoxin (STX) and cause paralytic shellfish poisoning (PSP) which can result in mortality in human. SxtA is considered a core gene for the biosynthesis of STX. However, its gene coding structure and evolutionary history have yet to be fully elucidated. Here, we determined the full-length sequences of sxtA cDNA and genomic coding regions from two toxic dinoflagellates, Alexandrium catenella (LIMS-PS-2645 and LIMS-PS-2647) andA. pacificum (LMBE-C4), characterised their domain structures, and resolved evolutionary events. The sxtA gene was encoded on the genome without introns, and was identical in length (4002 bp) between two A. catenella strains, but their sequences differed from A. pacificum (5031 bp). SxtA consists of four domains, sxtA1, sxtA2, sxtA3, and sxtA4; however, A. pacificum has an extra domain TauD near sxtA1. Each domain had >64.4% GC content, with the highest being 71.6% in sxtA3. Molecular divergence was found to be significantly higher in sxtA4 than in the other domains. Phylogenetic trees of sxtA and separate domains showed that bacteria diverged earliest, followed by non-toxic, toxic cyanobacteria, toxic dinoflagellates. While sxtA domains in Alexandrium were similar to the PKS-like structure with the conserved sxtA1, sxtA2, and sxtA3. PKS_KS may be replaced by sxtA4 in toxic Alexandrium. These suggest that sxtA in Alexandrium may have evolved by acquiring specific domains, whose modification and complexity markedly affect toxin biosynthesis.
Collapse
Affiliation(s)
- Quynh Thi Nhu Bui
- Department of Biotechnology, Sangmyung University, Seoul 03016, South Korea
| | - Hansol Kim
- Department of Biotechnology, Sangmyung University, Seoul 03016, South Korea
| | - Hui Wang
- Department of Biotechnology, Sangmyung University, Seoul 03016, South Korea; Hunan Province Key Laboratory of Typical Environmental Pollution and Health Hazards, School of Public Health, University of South China, Hengyang 421001, China
| | - Jang-Seu Ki
- Department of Biotechnology, Sangmyung University, Seoul 03016, South Korea.
| |
Collapse
|
7
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
8
|
Zhao B, Katuwawala A, Uversky VN, Kurgan L. IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell. Cell Mol Life Sci 2021; 78:2371-2385. [PMID: 32997198 PMCID: PMC11071772 DOI: 10.1007/s00018-020-03654-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/09/2020] [Accepted: 09/22/2020] [Indexed: 12/11/2022]
Abstract
Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL, 33612, USA.
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia.
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| |
Collapse
|
9
|
Exploring Potential Signals of Selection for Disordered Residues in Prokaryotic and Eukaryotic Proteins. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:549-564. [PMID: 33346088 PMCID: PMC8377245 DOI: 10.1016/j.gpb.2020.06.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 03/29/2020] [Accepted: 06/10/2020] [Indexed: 11/22/2022]
Abstract
Intrinsically disordered proteins (IDPs) are an important class of proteins in all domains of life for their functional importance. However, how nature has shaped the disorder potential of prokaryotic and eukaryotic proteins is still not clearly known. Randomly generated sequences are free of any selective constraints, thus these sequences are commonly used as null models. Considering different types of random protein models, here we seek to understand how the disorder potential of natural eukaryotic and prokaryotic proteins differs from random sequences. Comparing proteome-wide disorder content between real and random sequences of 12 model organisms, we noticed that eukaryotic proteins are enriched in disordered regions compared to random sequences, but in prokaryotes such regions are depleted. By analyzing the position-wise disorder profile, we show that there is a generally higher disorder near the N- and C-terminal regions of eukaryotic proteins as compared to the random models; however, either no or a weak such trend was found in prokaryotic proteins. Moreover, here we show that this preference is not caused by the amino acid or nucleotide composition at the respective sites. Instead, these regions were found to be endowed with a higher fraction of protein–protein binding sites, suggesting their functional importance. We discuss several possible explanations for this pattern, such as improving the efficiency of protein–protein interaction, ribosome movement during translation, and post-translational modification. However, further studies are needed to clearly understand the biophysical mechanisms causing the trend.
Collapse
|
10
|
Dayhoff GW, Regenmortel MHV, Uversky VN. Intrinsic disorder in protein sense‐antisense recognition. J Mol Recognit 2020; 33:e2868. [DOI: 10.1002/jmr.2868] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 05/04/2020] [Accepted: 05/18/2020] [Indexed: 01/03/2023]
Affiliation(s)
- Guy W. Dayhoff
- Department of Chemistry, College of Art and SciencesUniversity of South Florida Tampa Florida USA
| | | | - Vladimir N. Uversky
- Laboratory of New Methods in BiologyInstitute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences” Pushchino Russia
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research InstituteMorsani College of Medicine, University of South Florida Tampa Florida USA
| |
Collapse
|
11
|
Yan J, Cheng J, Kurgan L, Uversky VN. Structural and functional analysis of "non-smelly" proteins. Cell Mol Life Sci 2020; 77:2423-2440. [PMID: 31486849 PMCID: PMC11105052 DOI: 10.1007/s00018-019-03292-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 08/21/2019] [Accepted: 08/28/2019] [Indexed: 01/09/2023]
Abstract
Cysteine and aromatic residues are major structure-promoting residues. We assessed the abundance, structural coverage, and functional characteristics of the "non-smelly" proteins, i.e., proteins that do not contain cysteine residues (C-depleted) or cysteine and aromatic residues (CFYWH-depleted), across 817 proteomes from all domains of life. The analysis revealed that although these proteomes contained significant levels of the C-depleted proteins, with prokaryotes being significantly more enriched in such proteins than eukaryotes, the CFYWH-depleted proteins were relatively rare, accounting for about 0.05% of proteomes. Furthermore, CFYWH-depleted proteins were virtually never found in PDB. Depletion in cysteine and in aromatic residues was associated with the substantially increased intrinsic disorder levels across all domains of life. Archaeal and eukaryotic organisms with higher levels of the C-depleted proteins were shown to have higher levels of the intrinsic disorder and lower levels of structural coverage. We also showed that the "non-smelly" proteins typically did not independently fold into monomeric structures, and instead, they fold by interacting with nucleic acids as constituents of the ribosome and nucleosome complexes. They were shown to be involved in translation, transcription, nucleosome assembly, transmembrane transport, and protein folding functions, all of which are known to be associated with the intrinsic disorder. Our data suggested that, in general, structure of monomeric proteins is crucially dependent on the presence of cysteine and aromatic residues.
Collapse
Affiliation(s)
- Jing Yan
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd., MDC07, Tampa, FL, 33612, USA.
- Protein Research Group, Institute for Biological Instrumentation of the Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
| |
Collapse
|
12
|
Oldfield CJ, Peng Z, Uversky VN, Kurgan L. Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins. Cell Mol Life Sci 2020; 77:149-160. [PMID: 31175370 PMCID: PMC11104855 DOI: 10.1007/s00018-019-03166-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 05/14/2019] [Accepted: 05/28/2019] [Indexed: 02/06/2023]
Abstract
Protein-coding nucleic acids exhibit composition and codon biases between sequences coding for intrinsically disordered regions (IDRs) and those coding for structured regions. IDRs are regions of proteins that are folding self-insufficient and which function without the prerequisite of folded structure. Several authors have investigated composition bias or codon selection in regions encoding for IDRs, primarily in Eukaryota, and concluded that elevated GC content is the result of the biased amino acid composition of IDRs. We substantively extend previous work by examining GC content in regions encoding IDRs, from 44 species in Eukaryota, Archaea, and Bacteria, spanning a wide range of GC content. We confirm that regions coding for IDRs show a significantly elevated GC content, even across all domains of life. Although this is largely attributable to the amino acid composition bias of IDRs, we show that this bias is independent of the overall GC content and, most importantly, we are the first to observe that GC content bias in IDRs is significantly different than expected from IDR amino acid composition alone. We empirically find compensatory codon selection that reduces the observed GC content bias in IDRs. This selection is dependent on the overall GC content of the organism. The codon selection bias manifests as use of infrequent, AT-rich codons in encoding IDRs. Further, we find these relationships to be independent of the intrinsic disorder prediction method used, and independent of estimated translation efficiency. These observations are consistent with the previous work, and we speculate on whether the observed biases are causal or symptomatic of other driving forces.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, 300072, China
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA.
| |
Collapse
|
13
|
Evolutionary Forces and Codon Bias in Different Flavors of Intrinsic Disorder in the Human Proteome. J Mol Evol 2019; 88:164-178. [DOI: 10.1007/s00239-019-09921-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 11/26/2019] [Indexed: 12/22/2022]
|
14
|
Basile W, Salvatore M, Bassot C, Elofsson A. Why do eukaryotic proteins contain more intrinsically disordered regions? PLoS Comput Biol 2019; 15:e1007186. [PMID: 31329574 PMCID: PMC6675126 DOI: 10.1371/journal.pcbi.1007186] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 08/01/2019] [Accepted: 06/14/2019] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder? Intrinsic disorder is essential for various functions in eukaryotic cells and is a signature of eukaryotic proteins. Here, we try to understand the origin of the difference in disorder between eukaryotic and prokaryotic proteins. We show that eukaryotic proteins contain more extended linker regions and that these linker regions are significantly more disordered. Further, we show, for the first time, that the difference in disorder originates from a systematic difference in amino acid frequencies between eukaryotic and prokaryotic proteins. Three amino acids contribute to the difference in disorder; serine and proline are more abundant in eukaryotic linkers, while isoleucine is less frequent. These shifts in frequencies are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. Anyhow the widespread of the shifts in abundance indicates that the differences are ancient and caused be some yet not fully understood selective difference acting on eukaryotic and prokaryotic proteins.
Collapse
Affiliation(s)
- Walter Basile
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Marco Salvatore
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Claudio Bassot
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Swedish e-Science Research Center (SeRC), Stockholm, Sweden
- * E-mail:
| |
Collapse
|
15
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
16
|
Arabidopsis Heat Stress-Induced Proteins Are Enriched in Electrostatically Charged Amino Acids and Intrinsically Disordered Regions. Int J Mol Sci 2018; 19:ijms19082276. [PMID: 30081447 PMCID: PMC6121531 DOI: 10.3390/ijms19082276] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 07/24/2018] [Accepted: 07/31/2018] [Indexed: 01/06/2023] Open
Abstract
Comparison of the proteins of thermophilic, mesophilic, and psychrophilic prokaryotes has revealed several features characteristic to proteins adapted to high temperatures, which increase their thermostability. These characteristics include a profusion of disulfide bonds, salt bridges, hydrogen bonds, and hydrophobic interactions, and a depletion in intrinsically disordered regions. It is unclear, however, whether such differences can also be observed in eukaryotic proteins or when comparing proteins that are adapted to temperatures that are more subtly different. When an organism is exposed to high temperatures, a subset of its proteins is overexpressed (heat-induced proteins), whereas others are either repressed (heat-repressed proteins) or remain unaffected. Here, we determine the expression levels of all genes in the eukaryotic model system Arabidopsis thaliana at 22 and 37 °C, and compare both the amino acid compositions and levels of intrinsic disorder of heat-induced and heat-repressed proteins. We show that, compared to heat-repressed proteins, heat-induced proteins are enriched in electrostatically charged amino acids and depleted in polar amino acids, mirroring thermophile proteins. However, in contrast with thermophile proteins, heat-induced proteins are enriched in intrinsically disordered regions, and depleted in hydrophobic amino acids. Our results indicate that temperature adaptation at the level of amino acid composition and intrinsic disorder can be observed not only in proteins of thermophilic organisms, but also in eukaryotic heat-induced proteins; the underlying adaptation pathways, however, are similar but not the same.
Collapse
|
17
|
Banerjee S, Chakraborty S. Protein intrinsic disorder negatively associates with gene age in different eukaryotic lineages. MOLECULAR BIOSYSTEMS 2018; 13:2044-2055. [PMID: 28783193 DOI: 10.1039/c7mb00230k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The emergence of new protein-coding genes in a specific lineage or species provides raw materials for evolutionary adaptations. Until recently, the biology of new genes emerging particularly from non-genic sequences remained unexplored. Although the new genes are subjected to variable selection pressure and face rapid deletion, some of them become functional and are retained in the gene pool. To acquire functional novelties, new genes often get integrated into the pre-existing ancestral networks. However, the mechanism by which young proteins acquire novel interactions remains unanswered till date. Since structural orientation contributes hugely to the mode of proteins' physical interactions, in this regard, we put forward an interesting question - Do new genes encode proteins with stable folds? Addressing the question, we demonstrated that the intrinsic disorder inversely correlates with the evolutionary gene ages - i.e. young proteins are richer in intrinsic disorder than the ancient ones. We further noted that young proteins, which are initially poorly connected hubs, prefer to be structurally more disordered than well-connected ancient proteins. The phenomenon strikingly defies the usual trend of well-connected proteins being highly disordered in structure. We justified that structural disorder might help poorly connected young proteins to undergo promiscuous interactions, which provides the foundation for novel protein interactions. The study focuses on the evolutionary perspectives of young proteins in the light of structural adaptations.
Collapse
Affiliation(s)
- Sanghita Banerjee
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| | | |
Collapse
|
18
|
Meng F, Murray GF, Kurgan L, Donahue HJ. Functional and structural characterization of osteocytic MLO-Y4 cell proteins encoded by genes differentially expressed in response to mechanical signals in vitro. Sci Rep 2018; 8:6716. [PMID: 29712973 PMCID: PMC5928037 DOI: 10.1038/s41598-018-25113-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 04/09/2018] [Indexed: 12/29/2022] Open
Abstract
The anabolic response of bone to mechanical load is partially the result of osteocyte response to fluid flow-induced shear stress. Understanding signaling pathways activated in osteocytes exposed to fluid flow could identify novel signaling pathways involved in the response of bone to mechanical load. Bioinformatics allows for a unique perspective and provides key first steps in understanding these signaling pathways. We examined proteins encoded by genes differentially expressed in response to fluid flow in murine osteocytic MLO-Y4 cells. We considered structural and functional characteristics including putative intrinsic disorder, evolutionary conservation, interconnectedness in protein-protein interaction networks, and cellular localization. Our analysis suggests that proteins encoded by fluid flow activated genes have lower than expected conservation, are depleted in intrinsic disorder, maintain typical levels of connectivity for the murine proteome, and are found in the cytoplasm and extracellular space. Pathway analyses reveal that these proteins are associated with cellular response to stress, chemokine and cytokine activity, enzyme binding, and osteoclast differentiation. The lower than expected disorder of proteins encoded by flow activated genes suggests they are relatively specialized.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Graeme F Murray
- Bone Engineering, Science and Technology (BEST) Laboratory, Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, United States of America.
| | - Henry J Donahue
- Bone Engineering, Science and Technology (BEST) Laboratory, Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, Virginia, United States of America.
| |
Collapse
|
19
|
Functional Analysis of Human Hub Proteins and Their Interactors Involved in the Intrinsic Disorder-Enriched Interactions. Int J Mol Sci 2017; 18:ijms18122761. [PMID: 29257115 PMCID: PMC5751360 DOI: 10.3390/ijms18122761] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 12/13/2017] [Accepted: 12/15/2017] [Indexed: 12/15/2022] Open
Abstract
Some of the intrinsically disordered proteins and protein regions are promiscuous interactors that are involved in one-to-many and many-to-one binding. Several studies have analyzed enrichment of intrinsic disorder among the promiscuous hub proteins. We extended these works by providing a detailed functional characterization of the disorder-enriched hub protein-protein interactions (PPIs), including both hubs and their interactors, and by analyzing their enrichment among disease-associated proteins. We focused on the human interactome, given its high degree of completeness and relevance to the analysis of the disease-linked proteins. We quantified and investigated numerous functional and structural characteristics of the disorder-enriched hub PPIs, including protein binding, structural stability, evolutionary conservation, several categories of functional sites, and presence of over twenty types of posttranslational modifications (PTMs). We showed that the disorder-enriched hub PPIs have a significantly enlarged number of disordered protein binding regions and long intrinsically disordered regions. They also include high numbers of targeting, catalytic, and many types of PTM sites. We empirically demonstrated that these hub PPIs are significantly enriched among 11 out of 18 considered classes of human diseases that are associated with at least 100 human proteins. Finally, we also illustrated how over a dozen specific human hubs utilize intrinsic disorder for their promiscuous PPIs.
Collapse
|