1
|
Muselmani W, Kashif-Khan N, Bagnéris C, Santangelo R, Williams MA, Savva R. A Multimodal Approach towards Genomic Identification of Protein Inhibitors of Uracil-DNA Glycosylase. Viruses 2023; 15:1348. [PMID: 37376646 DOI: 10.3390/v15061348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 06/29/2023] Open
Abstract
DNA-mimicking proteins encoded by viruses can modulate processes such as innate cellular immunity. An example is Ung-family uracil-DNA glycosylase inhibition, which prevents Ung-mediated degradation via the stoichiometric protein blockade of the Ung DNA-binding cleft. This is significant where uracil-DNA is a key determinant in the replication and distribution of virus genomes. Unrelated protein folds support a common physicochemical spatial strategy for Ung inhibition, characterised by pronounced sequence plasticity within the diverse fold families. That, and the fact that relatively few template sequences are biochemically verified to encode Ung inhibitor proteins, presents a barrier to the straightforward identification of Ung inhibitors in genomic sequences. In this study, distant homologs of known Ung inhibitors were characterised via structural biology and structure prediction methods. A recombinant cellular survival assay and in vitro biochemical assay were used to screen distant variants and mutants to further explore tolerated sequence plasticity in motifs supporting Ung inhibition. The resulting validated sequence repertoire defines an expanded set of heuristic sequence and biophysical signatures shared by known Ung inhibitor proteins. A computational search of genome database sequences and the results of recombinant tests of selected output sequences obtained are presented here.
Collapse
Affiliation(s)
- Wael Muselmani
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Naail Kashif-Khan
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Claire Bagnéris
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Rosalia Santangelo
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Mark A Williams
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Renos Savva
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| |
Collapse
|
2
|
The Structure of Evolutionary Model Space for Proteins across the Tree of Life. BIOLOGY 2023; 12:biology12020282. [PMID: 36829559 PMCID: PMC9952988 DOI: 10.3390/biology12020282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/04/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023]
Abstract
The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the "model space" for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.
Collapse
|
3
|
Barceló-Antemate D, Fontove-Herrera F, Santos W, Merino E. The effect of the genomic GC content bias of prokaryotic organisms on the secondary structures of their proteins. PLoS One 2023; 18:e0285201. [PMID: 37141209 PMCID: PMC10159118 DOI: 10.1371/journal.pone.0285201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/17/2023] [Indexed: 05/05/2023] Open
Abstract
One of the main characteristics of prokaryotic genomes is the ratio in which guanine-cytosine bases are used in their DNA sequences. This is known as the genomic GC content and varies widely, from values below 20% to values greater than 74%. It has been demonstrated that the genomic GC content varies in accordance with the phylogenetic distribution of organisms and influences the amino acid composition of their corresponding proteomes. This bias is particularly important for amino acids that are coded by GC content-rich codons such as alanine, glycine, and proline, as well as amino acids that are coded by AT-rich codons, such as lysine, asparagine, and isoleucine. In our study, we extend these results by considering the effect of the genomic GC content on the secondary structure of proteins. On a set of 192 representative prokaryotic genomes and proteome sequences, we identified through a bioinformatic study that the composition of the secondary structures of the proteomes varies in relation to the genomic GC content; random coils increase as the genomic GC content increases, while alpha-helices and beta-sheets present an inverse relationship. In addition, we found that the tendency of an amino acid to form part of a secondary structure of proteins is not ubiquitous, as previously expected, but varies according to the genomic GC content. Finally, we discovered that for some specific groups of orthologous proteins, the GC content of genes biases the composition of secondary structures of the proteins for which they code.
Collapse
Affiliation(s)
- Diana Barceló-Antemate
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos (UAEM), Cuernavaca, Morelos, México
| | | | - Walter Santos
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Enrique Merino
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| |
Collapse
|
4
|
Laurie J, Chattopadhyay AK, Flower DR. Protein lipograms. J Theor Biol 2017; 430:109-116. [PMID: 28716385 DOI: 10.1016/j.jtbi.2017.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Revised: 06/30/2017] [Accepted: 07/12/2017] [Indexed: 11/20/2022]
Abstract
Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.
Collapse
Affiliation(s)
- Jason Laurie
- School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, UK; Systems Analytics Research Institute, Aston University, Birmingham B4 7ET, UK
| | - Amit K Chattopadhyay
- School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, UK; Systems Analytics Research Institute, Aston University, Birmingham B4 7ET, UK
| | - Darren R Flower
- School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK.
| |
Collapse
|
5
|
Whitworth DE, Slade SE, Mironas A. Composition of distinct sub-proteomes in Myxococcus xanthus: metabolic cost and amino acid availability. Amino Acids 2015; 47:2521-31. [PMID: 26162436 DOI: 10.1007/s00726-015-2042-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 06/29/2015] [Indexed: 01/05/2023]
Abstract
Subsets of proteins involved in distinct functional processes are subject to different selective pressures. We investigated whether there is an amino acid composition bias (AACB) inherent in discrete subsets of proteins, and whether we could identify changing patterns of AACB during the life cycle of the social bacterium Myxococcus xanthus. We quantitatively characterised the cellular, soluble secreted, and outer membrane vesicle (OMV) sub-proteomes of M. xanthus, identifying 315 proteins. The AACB of the cellular proteome differed only slightly from that deduced from the genome, suggesting that genome-inferred proteomes can accurately reflect the AACB of their host. Inferred AA deficiencies arising from prey consumption were exacerbated by the requirements of the 68%GC genome, whose character thus seems to be selected for directly rather than via the proteome. In our analysis, distinct subsets of the proteome (whether segregated spatially or temporally) exhibited distinct AACB, presumably tailored according to the needs of the organism's lifestyle and nutrient availability. Secreted AAs tend to be of lower cost than those retained in the cell, except for the early developmental A-signal, which is a particularly costly sub-proteome. We propose a model of AA reallocation during the M. xanthus life cycle, involving ribophagy during early starvation and sequestration of limiting AAs within cells during development.
Collapse
Affiliation(s)
- David E Whitworth
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Ceredigion, SY23 3DD, UK.
| | - Susan E Slade
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Adrian Mironas
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Ceredigion, SY23 3DD, UK
| |
Collapse
|
6
|
Reichenberger ER, Rosen G, Hershberg U, Hershberg R. Prokaryotic nucleotide composition is shaped by both phylogeny and the environment. Genome Biol Evol 2015; 7:1380-9. [PMID: 25861819 PMCID: PMC4453058 DOI: 10.1093/gbe/evv063] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/06/2015] [Indexed: 02/07/2023] Open
Abstract
The causes of the great variation in nucleotide composition of prokaryotic genomes have long been disputed. Here, we use extensive metagenomic and whole-genome data to demonstrate that both phylogeny and the environment shape prokaryotic nucleotide content. We show that across environments, various phyla are characterized by different mean guanine and cytosine (GC) values as well as by the extent of variation on that mean value. At the same time, we show that GC-content varies greatly as a function of environment, in a manner that cannot be entirely explained by disparities in phylogenetic composition. We find environmentally driven differences in nucleotide content not only between highly diverged environments (e.g., soil, vs. aquatic vs. human gut) but also within a single type of environment. More specifically, we demonstrate that some human guts are associated with a microbiome that is consistently more GC-rich across phyla, whereas others are associated with a more AT-rich microbiome. These differences appear to be driven both by variations in phylogenetic composition and by environmental differences-which are independent of these phylogenetic composition differences. Combined, our results demonstrate that both phylogeny and the environment significantly affect nucleotide composition and that the environmental differences affecting nucleotide composition are far subtler than previously appreciated.
Collapse
Affiliation(s)
- Erin R Reichenberger
- Department of Biomedical Engineering, Science & Health Systems, Drexel University
| | - Gail Rosen
- Department of Computer and Electrical Engineering, Drexel University
| | - Uri Hershberg
- Department of Biomedical Engineering, Science & Health Systems, Drexel University Department of Microbiology and Immunology, Drexel University College of Medicine
| | - Ruth Hershberg
- Rachel and Menachem Mendelovitch Evolutionary Processes of Mutation and Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, The Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
7
|
Cooper ED. Overly simplistic substitution models obscure green plant phylogeny. TRENDS IN PLANT SCIENCE 2014; 19:576-582. [PMID: 25023343 DOI: 10.1016/j.tplants.2014.06.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Revised: 05/25/2014] [Accepted: 06/05/2014] [Indexed: 06/03/2023]
Abstract
Phylogenetic analysis is an increasingly common and valuable component of plant science. Knowledge of the phylogenetic relationships between plant groups is a prerequisite for understanding the origin and evolution of important plant features, and phylogenetic analysis of individual genes and gene families provides fundamental insights into how those genes and their functions evolved. However, despite an active research community exploring and improving phylogenetic methods, the analytical methods commonly used, and the phylogenetic results they produce, are accorded far more confidence than they warrant. In this opinion article, I emphasise that important parts of the green plant phylogeny are inconsistently resolved and I argue that the lack of consistency arises due to inadequate modelling of changes in the substitution process.
Collapse
Affiliation(s)
- Endymion D Cooper
- CMNS-Cell Biology and Molecular Genetics, 2107 Bioscience Research Building, University of Maryland, College Park, MD 20742-4407, USA.
| |
Collapse
|
8
|
Fasani RA, Savageau MA. Evolution of a genome-encoded bias in amino acid biosynthetic pathways is a potential indicator of amino acid dynamics in the environment. Mol Biol Evol 2014; 31:2865-78. [PMID: 25118252 PMCID: PMC4209129 DOI: 10.1093/molbev/msu225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Overcoming the stress of starvation is one of an organism’s most challenging phenotypic responses. Those organisms that frequently survive the challenge, by virtue of their fitness, will have evolved genomes that are shaped by their specific environments. Understanding this genotype–environment–phenotype relationship at a deep level will require quantitative predictive models of the complex molecular systems that link these aspects of an organism’s existence. Here, we treat one of the most fundamental molecular systems, protein synthesis, and the amino acid biosynthetic pathways involved in the stringent response to starvation. These systems face an inherent logical dilemma: Building an amino acid biosynthetic pathway to synthesize its product—the cognate amino acid of the pathway—may require that very amino acid when it is no longer available. To study this potential “catch-22,” we have created a generic model of amino acid biosynthesis in response to sudden starvation. Our mathematical analysis and computational results indicate that there are two distinctly different outcomes: Partial recovery to a new steady state, or full system failure. Moreover, the cell’s fate is dictated by the cognate bias, the number of cognate amino acids in the corresponding biosynthetic pathway relative to the average number of that amino acid in the proteome. We test these implications by analyzing the proteomes of over 1,800 sequenced microbes, which reveals statistically significant evidence of low cognate bias, a genetic trait that would avoid the biosynthetic quandary. Furthermore, these results suggest that the pattern of cognate bias, which is readily derived by genome sequencing, may provide evolutionary clues to an organism’s natural environment.
Collapse
Affiliation(s)
- Rick A Fasani
- Department of Biomedical Engineering and Microbiology Graduate Group, University of California, Davis
| | - Michael A Savageau
- Department of Biomedical Engineering and Microbiology Graduate Group, University of California, Davis
| |
Collapse
|