Zhao G, Carson MB, Lu H. Prediction of specific protein-DNA recognition by knowledge-based two-body and three-body interaction potentials.
ACTA ACUST UNITED AC 2008;
2007:5017-20. [PMID:
18003133 DOI:
10.1109/iembs.2007.4353467]
[Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Gene regulation requires specific protein-DNA interactions. Detecting the short and variable DNA sequences in gene promoter regions to which transcription factors (TF) bind is a difficult challenge in bioinformatics. Here we have developed two-body and three-body interaction potentials that are able to assess protein-DNA interaction and achieve a higher level of specificity in the recognition of TF-binding sites. The potentials were calculated using experimentally characterized 3-D structures of protein-DNA complexes. We implemented two approaches in order to evaluate the potentials. Using the first method, we calculated the Z-score of the potential energy of a true TF-binding sequence when compared to 50,000 randomly generated DNA sequences. The second method allowed us to take advantage of the ability of statistical potentials to recognize novel TF-binding sites within the promoter region of genes. We found that the three-body potential, which takes into account the interaction between a DNA base and a protein residue with regard to the effect of a neighboring DNA base, had a better average Z-score than that of the two-body potential. This neighbor effect suggests that the local conformation of DNA does play a critical role in specific residue-base recognition. In all cases, the potentials developed here outperformed published results. The two sets of potentials were tested further by applying them in genome-scale TF-binding site prediction for the CRP protein in E. coli. Out of the 142 cases, 28% of the true binding sites ranked first (i.e.
Collapse