1
|
Oriol F, Alberto M, Joachim AP, Patrick G, M BP, Ruben MF, Jaume B, Altair CH, Ferran P, Oriol G, Narcis FF, Baldo O. Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements. NAR Genom Bioinform 2024; 6:lqae068. [PMID: 38867914 PMCID: PMC11167492 DOI: 10.1093/nargab/lqae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/14/2024] Open
Abstract
Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. We introduce a structure-based learning approach to predict the binding preferences of TFs and the automated modelling of TF regulatory complexes. We show the advantage of using our approach over the classical nearest-neighbor prediction in the limits of remote homology. Starting from a TF sequence or structure, we predict binding preferences in the form of motifs that are then used to scan a DNA sequence for occurrences. The best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: (i) the co-localization of TFs and (ii) the structural modeling of protein-protein interactions between TFs and with co-factors. We have applied our approach to automatically model the interferon-β enhanceosome and the pioneering complexes of OCT4, SOX2 (or SOX11) and KLF4 with a nucleosome, which are compared with the experimentally known structures.
Collapse
Affiliation(s)
- Fornes Oriol
- Centre for Molecular Medicine and Therapeutics. BC Children's Hospital Research Institute. Department of Medical Genetics. University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Meseguer Alberto
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | | | - Gohl Patrick
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bota Patricia M
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Molina-Fernández Ruben
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bonet Jaume
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
- Laboratory of Protein Design & Immunoengineering. School of Engineering. Ecole Polytechnique Federale de Lausanne. Lausanne 1015, Vaud, Switzerland
| | - Chinchilla-Hernandez Altair
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Pegenaute Ferran
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Gallego Oriol
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Fernandez-Fuentes Narcis
- Institute of Biological, Environmental and Rural Science. Aberystwyth University, SY23 3DA Aberystwyth, UK
| | - Oliva Baldo
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| |
Collapse
|
2
|
Roy A, Ray S. Traversing DNA-Protein Interactions Between Mesophilic and Thermophilic Bacteria: Implications from Their Cold Shock Response. Mol Biotechnol 2024; 66:824-844. [PMID: 36905463 DOI: 10.1007/s12033-023-00711-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 02/25/2023] [Indexed: 03/12/2023]
Abstract
Cold shock proteins (CSPs) are small, acidic proteins which contain a conserved nucleic acid-binding domain. These perform mRNA translation acting as "RNA chaperones" when triggered by low temperatures initiating their cold shock response. CSP- RNA interactions have been predominantly studied. Our focus will be CSP-DNA interaction examination, to analyse the diverse interaction patterns such as electrostatic, hydrogen and hydrophobic bonding in both thermophilic and mesophilic bacteria. The differences in the molecular mechanism of these contrasting bacterial proteins are studied. Computational techniques such as modelling, energy refinement, simulation and docking were operated to obtain data for comparative analysis. The thermostability factors which stabilise a thermophilic bacterium and their effect on their molecular regulation is investigated. Conformational deviation, atomic residual fluctuations, binding affinity, Electrostatic energy and Solvent Accessibility energy were determined during stimulation along with their conformational study. The study revealed that mesophilic bacteria E. coli CSP have higher binding affinity to DNA than thermophilic G. stearothermophilus. This was further evident by low conformation deviation and atomic fluctuations during simulation.
Collapse
Affiliation(s)
- Alankar Roy
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | - Sujay Ray
- Amity Institute of Biotechnology, Amity University, Kolkata, India.
| |
Collapse
|
3
|
Kulandaisamy A, Srivastava A, Nagarajan R, Gromiha MM. Dissecting and analyzing key residues in protein-DNA complexes. J Mol Recognit 2017; 31. [DOI: 10.1002/jmr.2692] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 11/06/2017] [Accepted: 11/06/2017] [Indexed: 02/03/2023]
Affiliation(s)
- A. Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences; Indian Institute of Technology Madras; Chennai 600 036 Tamilnadu India
| | - Ambuj Srivastava
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences; Indian Institute of Technology Madras; Chennai 600 036 Tamilnadu India
| | - R. Nagarajan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences; Indian Institute of Technology Madras; Chennai 600 036 Tamilnadu India
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences; Indian Institute of Technology Madras; Chennai 600 036 Tamilnadu India
| |
Collapse
|
4
|
Gapsys V, de Groot BL. Alchemical Free Energy Calculations for Nucleotide Mutations in Protein–DNA Complexes. J Chem Theory Comput 2017; 13:6275-6289. [DOI: 10.1021/acs.jctc.7b00849] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Vytautas Gapsys
- Computational Biomolecular
Dynamics Group, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Bert L. de Groot
- Computational Biomolecular
Dynamics Group, Max Planck Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
5
|
Smolinska K, Pacholczyk M. EMQIT: a machine learning approach for energy based PWM matrix quality improvement. Biol Direct 2017; 12:17. [PMID: 28764727 PMCID: PMC5539975 DOI: 10.1186/s13062-017-0189-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/17/2017] [Indexed: 11/10/2022] Open
Abstract
Background Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. Results Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. Conclusions The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit. Reviewers This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon. Electronic supplementary material The online version of this article (doi:10.1186/s13062-017-0189-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Karolina Smolinska
- Institute of Automatic Control, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
| | - Marcin Pacholczyk
- Institute of Automatic Control, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland.
| |
Collapse
|
6
|
Ramsey SA. An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters. Bioinform Biol Insights 2016; 9:59-69. [PMID: 27812284 PMCID: PMC5081247 DOI: 10.4137/bbi.s29330] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 09/11/2016] [Accepted: 09/18/2016] [Indexed: 12/24/2022] Open
Abstract
A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5′ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis–Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences.
Collapse
Affiliation(s)
- Stephen A Ramsey
- Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA.; School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
7
|
Qian J, Kong X, Deng N, Tan P, Chen H, Wang J, Li Z, Hu Y, Zou W, Xu J, Fang JY. OCT1 is a determinant of synbindin-related ERK signalling with independent prognostic significance in gastric cancer. Gut 2015; 64:37-48. [PMID: 24717932 PMCID: PMC4283676 DOI: 10.1136/gutjnl-2013-306584] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
OBJECTIVE Octamer transcription factor 1 (OCT1) was found to be expressed in intestinal metaplasia and gastric cancer (GC), but the exact roles of OCT1 in GC remain unclear. The objective of this study was to determine the functional and prognostic implications of OCT1 in GC. DESIGN Expression of OCT1 was examined in paired normal and cancerous gastric tissues and the prognostic significance of OCT1 was analysed by univariate and multivariate survival analyses. The functions of OCT1 on synbindin expression and extracellular signal-regulated kinase (ERK) phosphorylation were studied in vitro and in xenograft mouse models. RESULTS The OCT1 gene is recurrently amplified and upregulated in GC. OCT1 overexpression and amplification are associated with poor survival in patients with GC and the prognostic significance was confirmed by independent patient cohorts. Combining OCT1 overexpression with American Joint Committee on Cancer staging improved the prediction of survival in patients with GC. High expression of OCT1 associates with activation of the ERK mitogen-activated protein kinase signalling pathway in GC tissues. OCT1 functions by transactivating synbindin, which binds to ERK DEF domain and facilitates ERK phosphorylation by MEK. OCT1-synbindin signalling results in the activation of ERK substrates ELK1 and RSK, leading to increased cell proliferation and invasion. Immunofluorescent study of human GC tissue samples revealed strong association between OCT1 protein level and synbindin expression/ERK phosphorylation. Upregulation of OCT1 in mouse xenograft models induced synbindin expression and ERK activation, leading to accelerated tumour growth in vivo. CONCLUSIONS OCT1 is a driver of synbindin-mediated ERK signalling and a promising marker for the prognosis and molecular subtyping of GC.
Collapse
Affiliation(s)
- Jin Qian
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| | - Xuan Kong
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| | - Niantao Deng
- Cancer and Stem Cell Biology Program, Duke-NUS Graduate Medical School, Singapore, Singapore
| | - Patrick Tan
- Cancer and Stem Cell Biology Program, Duke-NUS Graduate Medical School, Singapore, Singapore,Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, Singapore, Singapore
| | - Haoyan Chen
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| | - Jilin Wang
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| | - Zhaoli Li
- Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin, China
| | - Ye Hu
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| | - Weiping Zou
- Department of Surgery, University of Michigan, Ann Arbor, Michigan, USA
| | - Jie Xu
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| | - Jing-Yuan Fang
- State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Disease, Shanghai, China
| |
Collapse
|
8
|
Ramsey SA, Vengrenyuk Y, Menon P, Podolsky I, Feig JE, Aderem A, Fisher EA, Gold ES. Epigenome-guided analysis of the transcriptome of plaque macrophages during atherosclerosis regression reveals activation of the Wnt signaling pathway. PLoS Genet 2014; 10:e1004828. [PMID: 25474352 PMCID: PMC4256277 DOI: 10.1371/journal.pgen.1004828] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 10/15/2014] [Indexed: 11/19/2022] Open
Abstract
We report the first systems biology investigation of regulators controlling arterial plaque macrophage transcriptional changes in response to lipid lowering in vivo in two distinct mouse models of atherosclerosis regression. Transcriptome measurements from plaque macrophages from the Reversa mouse were integrated with measurements from an aortic transplant-based mouse model of plaque regression. Functional relevance of the genes detected as differentially expressed in plaque macrophages in response to lipid lowering in vivo was assessed through analysis of gene functional annotations, overlap with in vitro foam cell studies, and overlap of associated eQTLs with human atherosclerosis/CAD risk SNPs. To identify transcription factors that control plaque macrophage responses to lipid lowering in vivo, we used an integrative strategy – leveraging macrophage epigenomic measurements – to detect enrichment of transcription factor binding sites upstream of genes that are differentially expressed in plaque macrophages during regression. The integrated analysis uncovered eight transcription factor binding site elements that were statistically overrepresented within the 5′ regulatory regions of genes that were upregulated in plaque macrophages in the Reversa model under maximal regression conditions and within the 5′ regulatory regions of genes that were upregulated in the aortic transplant model during regression. Of these, the TCF/LEF binding site was present in promoters of upregulated genes related to cell motility, suggesting that the canonical Wnt signaling pathway may be activated in plaque macrophages during regression. We validated this network-based prediction by demonstrating that β-catenin expression is higher in regressing (vs. control group) plaques in both regression models, and we further demonstrated that stimulation of canonical Wnt signaling increases macrophage migration in vitro. These results suggest involvement of canonical Wnt signaling in macrophage emigration from the plaque during lipid lowering-induced regression, and they illustrate the discovery potential of an epigenome-guided, systems approach to understanding atherosclerosis regression. Atherosclerosis, a progressive accumulation of lipid-rich plaque within arteries, is an inflammatory disease in which the response of macrophages (a key cell type of the innate immune system) to plasma lipoproteins plays a central role. In humans, the goal of significantly reducing already-established plaque through drug treatments, including statins, remains elusive. In mice, atherosclerosis can be reversed by experimental manipulations that lower circulating lipid levels. A common feature of many regression models is that macrophages transition to a less inflammatory state and emigrate from the plaque. While the molecular regulators that control these responses are largely unknown, we hypothesized that by integrating global measurements of macrophage gene expression in regressing plaques with measurements of the macrophage chromatin landscape, we could identify key molecules that control macrophage responses to the lowering of circulating lipid levels. Our systems biology analysis of plaque macrophages yielded a network in which the Wnt signaling pathway emerged as a candidate upstream regulator. Wnt signaling is known to affect both inflammation and the ability of macrophages to migrate from one location to another, and our targeted validation studies provide evidence that Wnt signaling is increased in plaque macrophages during regression. Our findings both demonstrate the power of a systems approach to uncover candidate regulators of regression and to identify a potential new therapeutic target.
Collapse
MESH Headings
- Animals
- Cells, Cultured
- Epigenesis, Genetic/drug effects
- Epigenesis, Genetic/physiology
- Female
- Gene Expression Profiling
- Genome/drug effects
- Hypolipidemic Agents/pharmacology
- Hypolipidemic Agents/therapeutic use
- Macrophages/drug effects
- Macrophages/metabolism
- Macrophages/pathology
- Mice
- Mice, Inbred C57BL
- Mice, Knockout
- Microarray Analysis
- Plaque, Atherosclerotic/drug therapy
- Plaque, Atherosclerotic/genetics
- Plaque, Atherosclerotic/metabolism
- Plaque, Atherosclerotic/pathology
- Receptors, LDL/genetics
- Remission Induction
- Transcriptome/drug effects
- Wnt Signaling Pathway/drug effects
- Wnt Signaling Pathway/genetics
Collapse
Affiliation(s)
- Stephen A. Ramsey
- Department of Biomedical Sciences and School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America
| | - Yuliya Vengrenyuk
- Division of Cardiology, School of Medicine, New York University, New York, New York, United States of America
| | - Prashanthi Menon
- Division of Cardiology, School of Medicine, New York University, New York, New York, United States of America
| | - Irina Podolsky
- Seattle Biomedical Research Institute, Seattle, Washington, United States of America
| | - Jonathan E. Feig
- Division of Cardiology, School of Medicine, New York University, New York, New York, United States of America
| | - Alan Aderem
- Seattle Biomedical Research Institute, Seattle, Washington, United States of America
| | - Edward A. Fisher
- Division of Cardiology, School of Medicine, New York University, New York, New York, United States of America
- * E-mail: (EAF); (ESG)
| | - Elizabeth S. Gold
- Seattle Biomedical Research Institute, Seattle, Washington, United States of America
- * E-mail: (EAF); (ESG)
| |
Collapse
|
9
|
Joyce AP, Zhang C, Bradley P, Havranek JJ. Structure-based modeling of protein: DNA specificity. Brief Funct Genomics 2014; 14:39-49. [PMID: 25414269 DOI: 10.1093/bfgp/elu044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein:DNA interactions are essential to a range of processes that maintain and express the information encoded in the genome. Structural modeling is an approach that aims to understand these interactions at the physicochemical level. It has been proposed that structural modeling can lead to deeper understanding of the mechanisms of protein:DNA interactions, and that progress in this field can not only help to rationalize the observed specificities of DNA-binding proteins but also to allow researchers to engineer novel DNA site specificities. In this review we discuss recent developments in the structural description of protein:DNA interactions and specificity, as well as the challenges facing the field in the future.
Collapse
|
10
|
On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:77-120. [PMID: 24629186 DOI: 10.1016/b978-0-12-800168-4.00004-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteins are the bricks and mortar of cells, playing structural and functional roles. In order to perform their function, they interact with each other as well as with other biomolecules such as DNA or RNA. Therefore, to fathom the function of a protein, we require knowing its partners and the atomic details of its interactions (i.e., the structure of the complex). However, the amount of protein interactions with an experimentally determined three-dimensional structure is scarce. Therefore, computational techniques such as homology modeling are foremost to fill this gap. Protein interactions can be modeled using as templates the interactions of homologous proteins, if the structure of the complex is known, or using docking methods. In both approaches, the estimation of the quality of models is essential. There are several ways to address this problem. In this review, we focus on the use of knowledge-based potentials for the analysis of protein interactions. We describe the procedure to derive statistical potentials and split them into different energetic terms that can be used for different purposes. We extensively discuss the fields where knowledge-based potentials have been successfully applied to (1) model protein-protein, protein-DNA, and protein-RNA interactions and (2) predict binding sites (in the protein and in the DNA). Moreover, we provide ready-to-use resources for docking and benchmarking protein interactions.
Collapse
|
11
|
Abstract
Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.
Collapse
Affiliation(s)
- Chih-Kang Lin
- Center for Systems Biology, National Taiwan University, Taipei 106, Taiwan
| | | |
Collapse
|
12
|
Hosseini P, Ovcharenko I, Matthews BF. Using an ensemble of statistical metrics to quantify large sets of plant transcription factor binding sites. PLANT METHODS 2013; 9:12. [PMID: 23578135 PMCID: PMC3639912 DOI: 10.1186/1746-4811-9-12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 03/28/2013] [Indexed: 05/07/2023]
Abstract
BACKGROUND From initial seed germination through reproduction, plants continuously reprogram their transcriptional repertoire to facilitate growth and development. This dynamic is mediated by a diverse but inextricably-linked catalog of regulatory proteins called transcription factors (TFs). Statistically quantifying TF binding site (TFBS) abundance in promoters of differentially expressed genes can be used to identify binding site patterns in promoters that are closely related to stress-response. Output from today's transcriptomic assays necessitates statistically-oriented software to handle large promoter-sequence sets in a computationally tractable fashion. RESULTS We present Marina, an open-source software for identifying over-represented TFBSs from amongst large sets of promoter sequences, using an ensemble of 7 statistical metrics and binding-site profiles. Through software comparison, we show that Marina can identify considerably more over-represented plant TFBSs compared to a popular software alternative. CONCLUSIONS Marina was used to identify over-represented TFBSs in a two time-point RNA-Seq study exploring the transcriptomic interplay between soybean (Glycine max) and soybean rust (Phakopsora pachyrhizi). Marina identified numerous abundant TFBSs recognized by transcription factors that are associated with defense-response such as WRKY, HY5 and MYB2. Comparing results from Marina to that of a popular software alternative suggests that regardless of the number of promoter-sequences, Marina is able to identify significantly more over-represented TFBSs.
Collapse
Affiliation(s)
- Parsa Hosseini
- Department of Bioinformatics and Computational Biology, George Mason University, Manassas, Virginia, USA
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Beltsville, Maryland, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, USA
| | - Benjamin F Matthews
- Soybean Genomics and Improvement Laboratory, United States Department of Agriculture, Beltsville, Maryland, USA
| |
Collapse
|
13
|
Haubrock M, Li J, Wingender E. Using potential master regulator sites and paralogous expansion to construct tissue-specific transcriptional networks. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 2:S15. [PMID: 23282021 PMCID: PMC3521180 DOI: 10.1186/1752-0509-6-s2-s15] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Background Transcriptional networks of higher eukaryotes are difficult to obtain. Available experimental data from conventional approaches are sporadic, while those generated with modern high-throughput technologies are biased. Computational predictions are generally perceived as being flooded with high rates of false positives. New concepts about the structure of regulatory regions and the function of master regulator sites may provide a way out of this dilemma. Methods We combined promoter scanning with positional weight matrices with a 4-genome conservativity analysis to predict high-affinity, highly conserved transcription factor (TF) binding sites and to infer TF-target gene relations. They were expanded to paralogous TFs and filtered for tissue-specific expression patterns to obtain a reference transcriptional network (RTN) as well as tissue-specific transcriptional networks (TTNs). Results When validated with experimental data sets, the predictions done showed the expected trends of true positive and true negative predictions, resulting in satisfying sensitivity and specificity characteristics. This also proved that confining the network reconstruction to the 1% top-ranking TF-target predictions gives rise to networks with expected degree distributions. Their expansion to paralogous TFs enriches them by tissue-specific regulators, providing a reasonable basis to reconstruct tissue-specific transcriptional networks. Conclusions The concept of master regulator or seed sites provides a reasonable starting point to select predicted TF-target relations, which, together with a paralogous expansion, allow for reconstruction of tissue-specific transcriptional networks.
Collapse
Affiliation(s)
- Martin Haubrock
- Department of Bioinformatics, University Medical Center Göttingen, Goldschmidtstrasse 1, D-37077 Göttingen, Germany
| | | | | |
Collapse
|