1
|
Harrison PM. Intrinsically Disordered Compositional Bias in Proteins: Sequence Traits, Region Clustering, and Generation of Hypothetical Functional Associations. Bioinform Biol Insights 2024; 18:11779322241287485. [PMID: 39417089 PMCID: PMC11481073 DOI: 10.1177/11779322241287485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 08/27/2024] [Indexed: 10/19/2024] Open
Abstract
Compositionally biased regions (CBRs), ie, tracts that are dominated by a subset of residue types, are common features of eukaryotic proteins. These are often found bounded within or almost coterminous with intrinsically disordered or 'natively unfolded' parts. Here, it is investigated how the function of such intrinsically disordered compositionally biased regions (ID-CBRs) is directly linked to their compositional traits, focusing on the well-characterized yeast (Saccharomyces cerevisiae) proteome as a test case. The ID-CBRs that are clustered together using compositional distance are discovered to have clear functional linkages at various levels of diversity. The specific case of the Sup35p and Rnq1p proteins that underlie causally linked prion phenomena ([PSI+] and [RNQ+]) is highlighted. Their prion-forming ID-CBRs are typically clustered very close together indicating some compositional engendering for [RNQ+] seeding of [PSI+] prions. Delving further, ID-CBRs with distinct types of residue patterning such as 'blocking' or relative segregation of residues into homopeptides are found to have significant functional trends. Specific examples of such ID-CBR functional linkages that are discussed are: Q/N-rich ID-CBRs linked to transcriptional coactivation, S-rich to transcription-factor binding, R-rich to DNA-binding, S/E-rich to protein localization, and D-rich linked to chromatin remodelling. These data may be useful in informing experimental hypotheses for proteins containing such regions.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada
| |
Collapse
|
2
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
3
|
Cascarina SM, Ross ED. Identification of Low-Complexity Domains by Compositional Signatures Reveals Class-Specific Frequencies and Functions Across the Domains of Life. PLoS Comput Biol 2024; 20:e1011372. [PMID: 38748749 PMCID: PMC11132505 DOI: 10.1371/journal.pcbi.1011372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 05/28/2024] [Accepted: 05/04/2024] [Indexed: 05/29/2024] Open
Abstract
Low-complexity domains (LCDs) in proteins are typically enriched in one or two predominant amino acids. As a result, LCDs often exhibit unusual structural/biophysical tendencies and can occupy functional niches. However, for each organism, protein sequences must be compatible with intracellular biomolecules and physicochemical environment, both of which vary from organism to organism. This raises the possibility that LCDs may occupy sequence spaces in select organisms that are otherwise prohibited in most organisms. Here, we report a comprehensive survey and functional analysis of LCDs in all known reference proteomes (>21k organisms), with added focus on rare and unusual types of LCDs. LCDs were classified according to both the primary amino acid and secondary amino acid in each LCD sequence, facilitating detailed comparisons of LCD class frequencies across organisms. Examination of LCD classes at different depths (i.e., domain of life, organism, protein, and per-residue levels) reveals unique facets of LCD frequencies and functions. To our surprise, all 400 LCD classes occur in nature, although some are exceptionally rare. A number of rare classes can be defined for each domain of life, with many LCD classes appearing to be eukaryote-specific. Certain LCD classes were consistently associated with identical functions across many organisms, particularly in eukaryotes. Our analysis methods enable simultaneous, direct comparison of all LCD classes between individual organisms, resulting in a proteome-scale view of differences in LCD frequencies and functions. Together, these results highlight the remarkable diversity and functional specificity of LCDs across all known life forms.
Collapse
Affiliation(s)
- Sean M. Cascarina
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado, United States of America
| | - Eric D. Ross
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado, United States of America
| |
Collapse
|
4
|
Harrison PM. Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins. Sci Rep 2024; 14:680. [PMID: 38182699 PMCID: PMC10770407 DOI: 10.1038/s41598-023-50991-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/28/2023] [Indexed: 01/07/2024] Open
Abstract
Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed 'low-complexity regions' (LCRs) or 'compositionally-biased regions' (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or 'cover' more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada.
| |
Collapse
|
5
|
Gómez-Pérez D, Schmid M, Chaudhry V, Hu Y, Velic A, Maček B, Ruhe J, Kemen A, Kemen E. Proteins released into the plant apoplast by the obligate parasitic protist Albugo selectively repress phyllosphere-associated bacteria. THE NEW PHYTOLOGIST 2023; 239:2320-2334. [PMID: 37222268 DOI: 10.1111/nph.18995] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 04/11/2023] [Indexed: 05/25/2023]
Abstract
Biotic and abiotic interactions shape natural microbial communities. The mechanisms behind microbe-microbe interactions, particularly those protein based, are not well understood. We hypothesize that released proteins with antimicrobial activity are a powerful and highly specific toolset to shape and defend plant niches. We have studied Albugo candida, an obligate plant parasite from the protist Oomycota phylum, for its potential to modulate the growth of bacteria through release of antimicrobial proteins into the apoplast. Amplicon sequencing and network analysis of Albugo-infected and uninfected wild Arabidopsis thaliana samples revealed an abundance of negative correlations between Albugo and other phyllosphere microbes. Analysis of the apoplastic proteome of Albugo-colonized leaves combined with machine learning predictors enabled the selection of antimicrobial candidates for heterologous expression and study of their inhibitory function. We found for three candidate proteins selective antimicrobial activity against Gram-positive bacteria isolated from A. thaliana and demonstrate that these inhibited bacteria are precisely important for the stability of the community structure. We could ascribe the antibacterial activity of the candidates to intrinsically disordered regions and positively correlate it with their net charge. This is the first report of protist proteins with antimicrobial activity under apoplastic conditions that therefore are potential biocontrol tools for targeted manipulation of the microbiome.
Collapse
Affiliation(s)
- Daniel Gómez-Pérez
- Microbial Interactions in Plant Ecosystems, Center for Plant Molecular Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Monja Schmid
- Microbial Interactions in Plant Ecosystems, Center for Plant Molecular Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Vasvi Chaudhry
- Microbial Interactions in Plant Ecosystems, Center for Plant Molecular Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Yiheng Hu
- Microbial Interactions in Plant Ecosystems, Center for Plant Molecular Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Ana Velic
- Department of Biology, Quantitative Proteomics Group, Interfaculty Institute of Cell Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Boris Maček
- Department of Biology, Quantitative Proteomics Group, Interfaculty Institute of Cell Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Jonas Ruhe
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | - Ariane Kemen
- Microbial Interactions in Plant Ecosystems, Center for Plant Molecular Biology, University of Tübingen, 72076, Tübingen, Germany
| | - Eric Kemen
- Microbial Interactions in Plant Ecosystems, Center for Plant Molecular Biology, University of Tübingen, 72076, Tübingen, Germany
| |
Collapse
|
6
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2023] [Indexed: 02/22/2023] Open
Abstract
Background: The evolutionary rate of disordered proteins varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of intrinsically disordered regions (IDRs) across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards low complexity regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, low complexity proteins across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of low complexity, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
Affiliation(s)
- Christos E. Kouros
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vasiliki Makri
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| | - Anastasia Chasapi
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| |
Collapse
|
7
|
Kouros CE, Makri V, Ouzounis CA, Chasapi A. Disease association and comparative genomics of compositional bias in human proteins. F1000Res 2023; 12:198. [PMID: 37082000 PMCID: PMC10111144 DOI: 10.12688/f1000research.129929.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2023] [Indexed: 04/25/2023] Open
Abstract
Background: The evolutionary rate of disordered protein regions varies greatly due to the lack of structural constraints. So far, few studies have investigated the presence/absence patterns of compositional bias, indicative of disorder, across phylogenies in conjunction with human disease. In this study, we report a genome-wide analysis of compositional bias association with disease in human proteins and their taxonomic distribution. Methods: The human genome protein set provided by the Ensembl database was annotated and analysed with respect to both disease associations and the detection of compositional bias. The Uniprot Reference Proteome dataset, containing 11297 proteomes was used as target dataset for the comparative genomics of a well-defined subset of the Human Genome, including 100 characteristic, compositionally biased proteins, some linked to disease. Results: Cross-evaluation of compositional bias and disease-association in the human genome reveals a significant bias towards biased regions in disease-associated genes, with charged, hydrophilic amino acids appearing as over-represented. The phylogenetic profiling of 17 disease-associated, proteins with compositional bias across 11297 proteomes captures characteristic taxonomic distribution patterns. Conclusions: This is the first time that a combined genome-wide analysis of compositional bias, disease-association and taxonomic distribution of human proteins is reported, covering structural, functional, and evolutionary properties. The reported framework can form the basis for large-scale, follow-up projects, encompassing the entire human genome and all known gene-disease associations.
Collapse
Affiliation(s)
- Christos E. Kouros
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Vasiliki Makri
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- BCCB-AIIA, School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| | - Anastasia Chasapi
- BCPL, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece
| |
Collapse
|
8
|
Luo J, Harrison PM. Evolution of sequence traits of prion-like proteins linked to amyotrophic lateral sclerosis (ALS). PeerJ 2022; 10:e14417. [PMID: 36415860 PMCID: PMC9676014 DOI: 10.7717/peerj.14417] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 10/28/2022] [Indexed: 11/18/2022] Open
Abstract
Prions are proteinaceous particles that can propagate an alternative conformation to further copies of the same protein. They have been described in mammals, fungi, bacteria and archaea. Furthermore, across diverse organisms from bacteria to eukaryotes, prion-like proteins that have similar sequence characters are evident. Such prion-like proteins have been linked to pathomechanisms of amyotrophic lateral sclerosis (ALS) in humans, in particular TDP43, FUS, TAF15, EWSR1 and hnRNPA2. Because of the desire to study human disease-linked proteins in model organisms, and to gain insights into the functionally important parts of these proteins and how they have changed across hundreds of millions of years of evolution, we analyzed how the sequence traits of these five proteins have evolved across eukaryotes, including plants and metazoa. We discover that the RNA-binding domain architecture of these proteins is deeply conserved since their emergence. Prion-like regions are also deeply and widely conserved since the origination of the protein families for FUS, TAF15 and EWSR1, and since the last common ancestor of metazoa for TDP43 and hnRNPA2. Prion-like composition is uncommon or weak in any plant orthologs observed, however in TDP43 many plant proteins have equivalent regions rich in other amino acids (namely glycine and tyrosine and/or serine) that may be linked to stress granule recruitment. Deeply conserved low-complexity domains are identified that likely have functional significance.
Collapse
|
9
|
Teekas L, Sharma S, Vijay N. Lineage-specific protein repeat expansions and contractions reveal malleable regions of immune genes. Genes Immun 2022; 23:218-234. [PMID: 36203090 DOI: 10.1038/s41435-022-00186-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/21/2022] [Accepted: 09/22/2022] [Indexed: 01/07/2023]
Abstract
Functional diversification, a higher evolutionary rate, and intense positive selection help a limited number of immune genes interact with many pathogens. Repeats in protein-coding regions are a well-known source of functional diversification, adaptive variation, and evolutionary novelty in a short time. Repeats play a crucial role in biochemical functions like functional diversification of transcription regulation, protein kinases, cell adhesion, signaling pathways, morphogenesis, DNA repair, recombination, and RNA processing. Repeat length variation can change the associated protein's interaction, efficacy, and overall protein network. Repeats have an intrinsic unstable nature and can potentially evolve rapidly and expedite the acquisition of complex phenotypic traits and functions. Because of their ability to generate rapid, adaptive variations over short evolutionary distances, repeats are considered "tuning knobs." Repeat length variation in specific genes, like RUNX2 and ALX4, is associated with morphological and physiological changes across vertebrates. Here we study repeat length variation as a potent source of species-specific immune diversification across several clades of tetrapods. Moreover, we provide a clade-wise comprehensive list of immune genes with repeat types for future studies of morphological/evolutionary changes within species groups. We observe significant repeat length variation of FASLG and C1QC in Rodentia and Primates' contrasting species groups, respectively.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India.
| |
Collapse
|
10
|
Aledo JC. A Census of Human Methionine-Rich Prion-like Domain-Containing Proteins. Antioxidants (Basel) 2022; 11:antiox11071289. [PMID: 35883780 PMCID: PMC9312190 DOI: 10.3390/antiox11071289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 11/16/2022] Open
Abstract
Methionine-rich prion-like proteins can regulate liquid–liquid phase separation processes in response to stresses. To date, however, very few proteins have been identified as methionine-rich prion-like. Herein, we have performed a computational survey of the human proteome to search for methionine-rich prion-like domains. We present a census of 51 manually curated methionine-rich prion-like proteins. Our results show that these proteins tend to be modular in nature, with molecular sizes significantly greater than those we would expect due to random sampling effects. These proteins also exhibit a remarkably high degree of spatial compaction when compared to average human proteins, even when protein size is accounted for. Computational evidence suggests that such a high degree of compactness might be due to the aggregation of methionine residues, pointing to a potential redox regulation of compactness. Gene ontology and network analyses, performed to shed light on the biological processes in which these proteins might participate, indicate that methionine-rich and non-methionine-rich prion-like proteins share gene ontology terms related to the regulation of transcription and translation but, more interestingly, these analyses also reveal that proteins from the methionine-rich group tend to share more gene ontology terms among them than they do with their non-methionine-rich prion-like counterparts.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Department of Molecular Biology and Biochemistry, University of Malaga, 29071 Malaga, Spain
| |
Collapse
|