1
|
A Novel cis Element Achieves the Same Solution as an Ancestral cis Element During Thiamine Starvation in Candida glabrata. G3-GENES GENOMES GENETICS 2020; 10:321-331. [PMID: 31732505 PMCID: PMC6945020 DOI: 10.1534/g3.119.400897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Regulatory networks often converge on very similar cis sequences to drive transcriptional programs due to constraints on what transcription factors are present. To determine the role of constraint loss on cis element evolution, we examined the recent appearance of a thiamine starvation regulated promoter in Candida glabrata. This species lacks the ancestral transcription factor Thi2, but still has the transcription factor Pdc2, which regulates thiamine starvation genes, allowing us to determine the effect of constraint change on a new promoter. We identified two different cis elements in C. glabrata - one present in the evolutionarily recent gene called CgPMU3, and the other element present in the other thiamine (THI) regulated genes. Reciprocal swaps of the cis elements and incorporation of the S. cerevisiaeThi2 transcription factor-binding site into these promoters demonstrate that the two elements are functionally different from one another. Thus, this loss of an imposed constraint on promoter function has generated a novel cis sequence, suggesting that loss of trans constraints can generate a non-convergent pathway with the same output.
Collapse
|
2
|
Škrlj B, Kunej T, Konc J. Insights from Ion Binding Site Network Analysis into Evolution and Functions of Proteins. Mol Inform 2018; 37:e1700144. [PMID: 29418080 DOI: 10.1002/minf.201700144] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 02/01/2018] [Indexed: 01/05/2023]
Abstract
Many biological phenomena can be represented as complex networks. Using a protein binding site comparison approach, we generated a network of ion binding sites on the scale of all known protein structures from the Protein Data Bank. We found that this ion binding site similarity network is scale-free, indicating a network in which a few ion binding site scaffolds are the network hubs, and these are connected to hundreds of nodes, whereas the vast majority of nodes have only a few neighbors. Enrichment and statistical analysis of the network components and communities yielded insights into underlying processes from the functional and the structural perspective. Largest components and communities were observed to be closely related to basic metabolic processes and some of the most common structural folds, which, from the evolutionary point of view, indicates that they may be the oldest ones. Further, we derived the first comprehensive map of ion interchangeability, based on binding site similarity. Several highly interchangeable protein-ion binding site pairs emerged (e.g., Ca2+ and Mg2+ ), as well as structurally distinct ones. The constructed network of ion binding site similarities will aid in understanding the general principles of protein-ion binding sites structure, function and evolution. We demonstrate potential uses of the network on proteins involved in cancer development and immune response, where individual ions play prominent roles in disease development.
Collapse
Affiliation(s)
- Blaž Škrlj
- Department of molecular modeling, National Institute of Chemistry, Hajdrihova 19, Ljubljana, Slovenia.,Jožef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Slovenia
| | - Janez Konc
- Department of molecular modeling, National Institute of Chemistry, Hajdrihova 19, Ljubljana, Slovenia
| |
Collapse
|
3
|
Czeizler E, Hirvola T, Karhu K. A graph-theoretical approach for motif discovery in protein sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:121-130. [PMID: 28055896 DOI: 10.1109/tcbb.2015.2511750] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Motif recognition is a challenging problem in bioinformatics due to the diversity of protein motifs. Many existing algorithms identify motifs of a given length, thus being either not applicable or not efficient when searching simultaneously for motifs of various lengths. Searching for gapped motifs, although very important, is a highly time-consuming task due to the combinatorial explosion of possible combinations implied by the consideration of long gaps. We introduce a new graph theoretical approach to identify motifs of various lengths, both with and without gaps. We compare our approach with two widely used methods: MEME and GLAM2 analyzing both the quality of the results and the required computational time. Our method provides results of a slightly higher level of quality than MEME but at a much faster rate, i.e., one eighth of MEME's query time. By using similarity indexing, we drop the query times down to an average of approximately one sixth of the ones required by GLAM2, while achieving a slightly higher level of quality of the results. More precisely, for sequence collections smaller than 50000 bytes GLAM2 is 13 times slower, while being at least as fast as our method on larger ones. The source code of our C++ implementation is freely available in GitHub: https://github.com/hirvolt1/debruijn-motif.
Collapse
|
4
|
Drozdova P, Rogoza T, Radchenko E, Lipaeva P, Mironova L. Transcriptional response to the [ISP(+) ] prion of Saccharomyces cerevisiae differs from that induced by the deletion of its structural gene, SFP1. FEMS Yeast Res 2014; 14:1160-70. [PMID: 25227157 DOI: 10.1111/1567-1364.12211] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2014] [Revised: 09/09/2014] [Accepted: 09/09/2014] [Indexed: 12/21/2022] Open
Abstract
Currently, several protein-based genetic determinants, or prions, are described in yeast, and several hundred prion candidates have been predicted. Importantly, many known and potential prion proteins regulate transcription; therefore, prion induction should affect gene expression. While it is generally believed that the prion phenotype should mimic the deletion phenotype, this rule has exceptions. Formed by the transcription factor Sfp1p, [ISP(+) ] is one such exception as the [ISP(+) ] and sfp1Δ strains differ in many phenotypic traits. These data suggest that effects of prion formation by a transcription factor and its absence may affect gene expression in a different way. However, studies addressing this issue are practically absent. Here, we explore how [ISP(+) ] affects gene expression and how these changes correspond to the effect of SFP1 deletion. Our data indicate that the [ISP(+) ]-related expression changes cannot be explained by the inactivation of Sfp1p. Remarkably, most Sfp1p targets are not affected in the [ISP(+) ] strain; instead, the genes upregulated in the [ISP(+) ] strain are enriched in Gcn4p and Aft1p targets. We propose that Sfp1p serves as a part of a regulatory complex, and the activity of this complex may be modulated differently by the absence or prionization of Sfp1p.
Collapse
Affiliation(s)
- Polina Drozdova
- Department of Genetics and Biotechnology, Saint Petersburg State University, St. Petersburg, Russia; Laboratory of Amyloid Biology, Saint Petersburg State University, St. Petersburg, Russia
| | | | | | | | | |
Collapse
|
5
|
Abstract
The term “transcriptional network” refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
Collapse
|
6
|
Pollock DD, de Koning APJ, Kim H, Castoe TA, Churchill MEA, Kechris KJ. Bayesian analysis of high-throughput quantitative measurement of protein-DNA interactions. PLoS One 2011; 6:e26105. [PMID: 22069446 PMCID: PMC3206046 DOI: 10.1371/journal.pone.0026105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Accepted: 09/19/2011] [Indexed: 11/19/2022] Open
Abstract
Transcriptional regulation depends upon the binding of transcription factor (TF) proteins to DNA in a sequence-dependent manner. Although many experimental methods address the interaction between DNA and proteins, they generally do not comprehensively and accurately assess the full binding repertoire (the complete set of sequences that might be bound with at least moderate strength). Here, we develop and evaluate through simulation an experimental approach that allows simultaneous high-throughput quantitative analysis of TF binding affinity to thousands of potential DNA ligands. Tens of thousands of putative binding targets can be mixed with a TF, and both the pre-bound and bound target pools sequenced. A hierarchical Bayesian Markov chain Monte Carlo approach determines posterior estimates for the dissociation constants, sequence-specific binding energies, and free TF concentrations. A unique feature of our approach is that dissociation constants are jointly estimated from their inferred degree of binding and from a model of binding energetics, depending on how many sequence reads are available and the explanatory power of the energy model. Careful experimental design is necessary to obtain accurate results over a wide range of dissociation constants. This approach, which we call Simultaneous Ultra high-throughput Ligand Dissociation EXperiment (SULDEX), is theoretically capable of rapid and accurate elucidation of an entire TF-binding repertoire.
Collapse
Affiliation(s)
- David D Pollock
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America.
| | | | | | | | | | | |
Collapse
|
7
|
Meng G, Mosig A, Vingron M. A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes. BMC Bioinformatics 2010; 11:267. [PMID: 20487530 PMCID: PMC3098066 DOI: 10.1186/1471-2105-11-267] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 05/20/2010] [Indexed: 12/28/2022] Open
Abstract
Background Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for these motifs. While frequently explored for yeast, the validity of the underlying hypothesis has not been assessed systematically in mammals. This demonstrates the need for a systematic and quantitative evaluation to what degree co-expressed genes share over-represented motifs for mammals. Results We identified 33 experiments for human and mouse in the ArrayExpress Database where transcription factors were manipulated and which exhibited a significant number of differentially expressed genes. We checked for over-representation of transcription factor binding sites in up- or down-regulated genes using the over-representation analysis tool oPOSSUM. In 25 out of 33 experiments, this procedure identified the binding matrices of the affected transcription factors. We also carried out de novo prediction of regulatory motifs shared by differentially expressed genes. Again, the detected motifs shared significant similarity with the matrices of the affected transcription factors. Conclusions Our results support the claim that functional regulatory motifs are over-represented in sets of differentially expressed genes and that they can be detected with computational methods.
Collapse
Affiliation(s)
- Guofeng Meng
- CAS-MPG Partner Institute and Key Laboratory for Computational Biology, Shanghai Institutes for Biological Sciences, 320 Yue Yang Road, 200031, Shanghai, China.
| | | | | |
Collapse
|
8
|
The effect of orthology and coregulation on detecting regulatory motifs. PLoS One 2010; 5:e8938. [PMID: 20140085 PMCID: PMC2815771 DOI: 10.1371/journal.pone.0008938] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 01/05/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. METHODOLOGY We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. RESULTS AND CONCLUSIONS Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
Collapse
|
9
|
Yanover C, Singh M, Zaslavsky E. M are better than one: an ensemble-based motif finder and its application to regulatory element prediction. ACTA ACUST UNITED AC 2009; 25:868-74. [PMID: 19223448 DOI: 10.1093/bioinformatics/btp090] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Identifying regulatory elements in genomic sequences is a key component in understanding the control of gene expression. Computationally, this problem is often addressed by motif discovery, where the goal is to find a set of mutually similar subsequences within a collection of input sequences. Though motif discovery is widely studied and many approaches to it have been suggested, it remains a challenging and as yet unresolved problem. RESULTS We introduce SAMF (Solution-Aggregating Motif Finder), a novel approach for motif discovery. SAMF is based on a Markov Random Field formulation, and its key idea is to uncover and aggregate multiple statistically significant solutions to the given motif finding problem. In contrast to many earlier methods, SAMF does not require prior estimates on the number of motif instances present in the data, is not limited by motif length, and allows motifs to overlap. Though SAMF is broadly applicable, these features make it particularly well suited for addressing the challenges of prokaryotic regulatory element detection. We test SAMF's ability to find transcription factor binding sites in an Escherichia coli dataset and show that it outperforms previous methods. Additionally, we uncover a number of previously unidentified binding sites in this data, and provide evidence that they correspond to actual regulatory elements. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Yanover
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | | | | |
Collapse
|
10
|
Lintner RE, Mishra PK, Srivastava P, Martinez-Vaz BM, Khodursky AB, Blumenthal RM. Limited functional conservation of a global regulator among related bacterial genera: Lrp in Escherichia, Proteus and Vibrio. BMC Microbiol 2008; 8:60. [PMID: 18405378 PMCID: PMC2374795 DOI: 10.1186/1471-2180-8-60] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2007] [Accepted: 04/11/2008] [Indexed: 02/03/2023] Open
Abstract
Background Bacterial genome sequences are being determined rapidly, but few species are physiologically well characterized. Predicting regulation from genome sequences usually involves extrapolation from better-studied bacteria, using the hypothesis that a conserved regulator, conserved target gene, and predicted regulator-binding site in the target promoter imply conserved regulation between the two species. However many compared organisms are ecologically and physiologically diverse, and the limits of extrapolation have not been well tested. In E. coli K-12 the leucine-responsive regulatory protein (Lrp) affects expression of ~400 genes. Proteus mirabilis and Vibrio cholerae have highly-conserved lrp orthologs (98% and 92% identity to E. coli lrp). The functional equivalence of Lrp from these related species was assessed. Results Heterologous Lrp regulated gltB, livK and lrp transcriptional fusions in an E. coli background in the same general way as the native Lrp, though with significant differences in extent. Microarray analysis of these strains revealed that the heterologous Lrp proteins significantly influence only about half of the genes affected by native Lrp. In P. mirabilis, heterologous Lrp restored swarming, though with some pattern differences. P. mirabilis produced substantially more Lrp than E. coli or V. cholerae under some conditions. Lrp regulation of target gene orthologs differed among the three native hosts. Strikingly, while Lrp negatively regulates its own gene in E. coli, and was shown to do so even more strongly in P. mirabilis, Lrp appears to activate its own gene in V. cholerae. Conclusion The overall similarity of regulatory effects of the Lrp orthologs supports the use of extrapolation between related strains for general purposes. However this study also revealed intrinsic differences even between orthologous regulators sharing >90% overall identity, and 100% identity for the DNA-binding helix-turn-helix motif, as well as differences in the amounts of those regulators. These results suggest that predicting regulation of specific target genes based on genome sequence comparisons alone should be done on a conservative basis.
Collapse
Affiliation(s)
- Robert E Lintner
- Department of Medical Microbiology and Immunology, University of Toledo Health Sciences Center, Toledo, OH 43614-2598, USA.
| | | | | | | | | | | |
Collapse
|