1
|
Lawson CL, Berman H, Chen L, Vallat B, Zirbel C. The Nucleic Acid Knowledgebase: a new portal for 3D structural information about nucleic acids. Nucleic Acids Res 2024; 52:D245-D254. [PMID: 37953312 PMCID: PMC10767938 DOI: 10.1093/nar/gkad957] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 10/02/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
The Nucleic Acid Knowledgebase (nakb.org) is a new data resource, updated weekly, for experimentally determined 3D structures containing DNA and/or RNA nucleic acid polymers and their biological assemblies. NAKB indexes nucleic acid-containing structures derived from all major structure determination methods (X-ray, NMR and EM), including all held by the Protein Data Bank (PDB). As the planned successor to the Nucleic Acid Database (NDB), NAKB's design preserves all functionality of the NDB and provides novel nucleic acid-centric content, including structural and functional annotations, as well as annotations from and links to external resources. A variety of custom interactive tools have been developed to enable rapid exploration and drill-down of NAKB's content.
Collapse
Affiliation(s)
- Catherine L Lawson
- Institute for Quantitative Biomedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Li Chen
- Institute for Quantitative Biomedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Institute for Quantitative Biomedicine, Rutgers, State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Craig L Zirbel
- Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
| |
Collapse
|
2
|
Olivencia MA, Villegas-Esguevillas M, Sancho M, Barreira B, Paternoster E, Adão R, Larriba MJ, Cogolludo A, Perez-Vizcaino F. Vitamin D Receptor Deficiency Upregulates Pulmonary Artery Kv7 Channel Activity. Int J Mol Sci 2023; 24:12350. [PMID: 37569725 PMCID: PMC10418734 DOI: 10.3390/ijms241512350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/26/2023] [Accepted: 07/29/2023] [Indexed: 08/13/2023] Open
Abstract
Recent evidence suggests that vitamin D is involved in the development of pulmonary arterial hypertension (PAH). The aim of this study was to analyze the electrophysiological and contractile properties of pulmonary arteries (PAs) in vitamin D receptor knockout mice (Vdr-/-). PAs were dissected and mounted in a wire myograph. Potassium membrane currents were recorded in freshly isolated PA smooth muscle cells (PASMCs) using the conventional whole-cell configuration of the patch-clamp technique. Potential vitamin D response elements (VDREs) in Kv7 channels coding genes were studied, and their protein expression was analyzed. Vdr-/- mice did not show a pulmonary hypertensive phenotype, as neither right ventricular hypertrophy nor endothelial dysfunction was apparent. However, resistance PA from these mice exhibited increased response to retigabine, a Kv7 activator, compared to controls and heterozygous mice. Furthermore, the current sensitive to XE991, a Kv7 inhibitor, was also higher in PASMCs from knockout mice. A possible VDRE was found in the gene coding for KCNE4, the regulatory subunit of Kv7.4. Accordingly, Vdr-/- mice showed an increased expression of KCNE4 in the lungs, with no changes in Kv7.1 and Kv7.4. These results indicate that the absence of Vdr in mice, as occurred with vitamin D deficient rats, is not sufficient to induce PAH. However, the contribution of Kv7 channel currents to the regulation of PA tone is increased in Vdr-/- mice, resembling animals and humans suffering from PAH.
Collapse
Affiliation(s)
- Miguel A Olivencia
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| | - Marta Villegas-Esguevillas
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| | - Maria Sancho
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
- Department of Physiology, School of Medicine, Universidad Complutense de Madrid, 28040 Madrid, Spain
| | - Bianca Barreira
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| | - Elena Paternoster
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| | - Rui Adão
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| | - María Jesús Larriba
- Instituto de Investigaciones Biomédicas Alberto Sols, Consejo Superior de Investigaciones Científicas, Universidad Autónoma de Madrid, 28029 Madrid, Spain
- Ciber Cáncer (CIBERONC), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria del Hospital Universitario La Paz (IdiPAZ), 28029 Madrid, Spain
| | - Angel Cogolludo
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| | - Francisco Perez-Vizcaino
- Department of Pharmacology and Toxicology, School of Medicine, University Complutense of Madrid, 28040 Madrid, Spain
- Ciber Enfermedades Respiratorias (CIBERES), 28029 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM), 28009 Madrid, Spain
| |
Collapse
|
3
|
Yan Y, Huang T. The Interactome of Protein, DNA, and RNA. Methods Mol Biol 2023; 2695:89-110. [PMID: 37450113 DOI: 10.1007/978-1-0716-3346-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Proteins participate in many processes of the organism and are very important for maintaining the health of the organism. However, proteins cannot function independently in the body. They must interact with proteins, DNA, RNA, and other substances to perform biological functions and maintain the body's health. At present, there are many experimental methods and software tools that can detect and predict the interaction between proteins and other substances. There are also many databases that record the interaction between proteins and other substances. This article mainly describes protein-protein, protein-DNA, and protein-RNA interactions in detail by introducing some commonly used experimental methods, the software tools produced with the accumulation of experimental data and the rapid development of machine learning, and the related databases that record the relationship between proteins and some substances. By this review, we hope that through the analysis and summary of various aspects, it will be convenient for researchers to conduct further research on protein interactions.
Collapse
Affiliation(s)
- Yuyao Yan
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
4
|
Ghani NSA, Emrizal R, Moffit SM, Hamdani HY, Ramlan EI, Firdaus-Raih M. GrAfSS: a webserver for substructure similarity searching and comparisons in the structures of proteins and RNA. Nucleic Acids Res 2022; 50:W375-W383. [PMID: 35639505 PMCID: PMC9252811 DOI: 10.1093/nar/gkac402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/28/2022] [Accepted: 05/08/2022] [Indexed: 12/03/2022] Open
Abstract
The GrAfSS (Graph theoretical Applications for Substructure Searching) webserver is a platform to search for three-dimensional substructures of: (i) amino acid side chains in protein structures; and (ii) base arrangements in RNA structures. The webserver interfaces the functions of five different graph theoretical algorithms – ASSAM, SPRITE, IMAAAGINE, NASSAM and COGNAC – into a single substructure searching suite. Users will be able to identify whether a three-dimensional (3D) arrangement of interest, such as a ligand binding site or 3D motif, observed in a protein or RNA structure can be found in other structures available in the Protein Data Bank (PDB). The webserver also allows users to determine whether a protein or RNA structure of interest contains substructural arrangements that are similar to known motifs or 3D arrangements. These capabilities allow for the functional annotation of new structures that were either experimentally determined or computationally generated (such as the coordinates generated by AlphaFold2) and can provide further insights into the diversity or conservation of functional mechanisms of structures in the PDB. The computed substructural superpositions are visualized using integrated NGL viewers. The GrAfSS server is available at http://mfrlab.org/grafss/.
Collapse
Affiliation(s)
- Nur Syatila Ab Ghani
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Reeki Emrizal
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Sabrina Mohamed Moffit
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| | - Hazrina Yusof Hamdani
- Advanced Medical and Dental Institute, Universiti Sains Malaysia, Bertam, Kepala Batas 13200, Pulau Pinang, Malaysia
| | | | - Mohd Firdaus-Raih
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia.,Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor, Malaysia
| |
Collapse
|
5
|
Biró B, Zhao B, Kurgan L. Complementarity of the residue-level protein function and structure predictions in human proteins. Comput Struct Biotechnol J 2022; 20:2223-2234. [PMID: 35615015 PMCID: PMC9118482 DOI: 10.1016/j.csbj.2022.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 05/02/2022] [Accepted: 05/02/2022] [Indexed: 11/24/2022] Open
Abstract
Sequence-based predictors of the residue-level protein function and structure cover a broad spectrum of characteristics including intrinsic disorder, secondary structure, solvent accessibility and binding to nucleic acids. They were catalogued and evaluated in numerous surveys and assessments. However, methods focusing on a given characteristic are studied separately from predictors of other characteristics, while they are typically used on the same proteins. We fill this void by studying complementarity of a representative collection of methods that target different predictions using a large, taxonomically consistent, and low similarity dataset of human proteins. First, we bridge the gap between the communities that develop structure-trained vs. disorder-trained predictors of binding residues. Motivated by a recent study of the protein-binding residue predictions, we empirically find that combining the structure-trained and disorder-trained predictors of the DNA-binding and RNA-binding residues leads to substantial improvements in predictive quality. Second, we investigate whether diverse predictors generate results that accurately reproduce relations between secondary structure, solvent accessibility, interaction sites, and intrinsic disorder that are present in the experimental data. Our empirical analysis concludes that predictions accurately reflect all combinations of these relations. Altogether, this study provides unique insights that support combining results produced by diverse residue-level predictors of protein function and structure.
Collapse
Affiliation(s)
- Bálint Biró
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Gödöllő, Hungary
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
6
|
Zhang F, Zhao B, Shi W, Li M, Kurgan L. DeepDISOBind: accurate prediction of RNA-, DNA- and protein-binding intrinsically disordered residues with deep multi-task learning. Brief Bioinform 2021; 23:6461158. [PMID: 34905768 DOI: 10.1093/bib/bbab521] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 10/30/2021] [Accepted: 11/14/2021] [Indexed: 12/14/2022] Open
Abstract
Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/.
Collapse
Affiliation(s)
- Fuhao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Wenbo Shi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
7
|
Jiang L, Guo F, Tang J, Yu H, Ness S, Duan M, Mao P, Zhao YY, Guo Y. SBSA: an online service for somatic binding sequence annotation. Nucleic Acids Res 2021; 50:e4. [PMID: 34606615 PMCID: PMC8500130 DOI: 10.1093/nar/gkab877] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 09/10/2021] [Accepted: 09/17/2021] [Indexed: 12/11/2022] Open
Abstract
Efficient annotation of alterations in binding sequences of molecular regulators can help identify novel candidates for mechanisms study and offer original therapeutic hypotheses. In this work, we developed Somatic Binding Sequence Annotator (SBSA) as a full-capacity online tool to annotate altered binding motifs/sequences, addressing diverse types of genomic variants and molecular regulators. The genomic variants can be somatic mutation, single nucleotide polymorphism, RNA editing, etc. The binding motifs/sequences involve transcription factors (TFs), RNA-binding proteins, miRNA seeds, miRNA-mRNA 3′-UTR binding target, or can be any custom motifs/sequences. Compared to similar tools, SBSA is the first to support miRNA seeds and miRNA-mRNA 3′-UTR binding target, and it unprecedentedly implements a personalized genome approach that accommodates joint adjacent variants. SBSA is empowered to support an indefinite species, including preloaded reference genomes for SARS-Cov-2 and 25 other common organisms. We demonstrated SBSA by annotating multi-omics data from over 30,890 human subjects. Of the millions of somatic binding sequences identified, many are with known severe biological repercussions, such as the somatic mutation in TERT promoter region which causes a gained binding sequence for E26 transformation-specific factor (ETS1). We further validated the function of this TERT mutation using experimental data in cancer cells. Availability:http://innovebioinfo.com/Annotation/SBSA/SBSA.php.
Collapse
Affiliation(s)
- Limin Jiang
- Faculty of Life Science & Medicine, Northwest University, No. 229 Taibai North Road, Xi'an 710069, China.,School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Hui Yu
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87109, USA
| | - Scott Ness
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87109, USA
| | - Mingrui Duan
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87109, USA
| | - Peng Mao
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87109, USA
| | - Ying-Yong Zhao
- Faculty of Life Science & Medicine, Northwest University, No. 229 Taibai North Road, Xi'an 710069, China
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87109, USA
| |
Collapse
|
8
|
Cabrera JJ, Jiménez-Leiva A, Tomás-Gallardo L, Parejo S, Casado S, Torres MJ, Bedmar EJ, Delgado MJ, Mesa S. Dissection of FixK 2 protein-DNA interaction unveils new insights into Bradyrhizobium diazoefficiens lifestyles control. Environ Microbiol 2021; 23:6194-6209. [PMID: 34227211 DOI: 10.1111/1462-2920.15661] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/01/2021] [Accepted: 07/03/2021] [Indexed: 11/28/2022]
Abstract
The FixK2 protein plays a pivotal role in a complex regulatory network, which controls genes for microoxic, denitrifying, and symbiotic nitrogen-fixing lifestyles in Bradyrhizobium diazoefficiens. Among the microoxic-responsive FixK2 -activated genes are the fixNOQP operon, indispensable for respiration in symbiosis, and the nnrR regulatory gene needed for the nitric-oxide dependent induction of the norCBQD genes encoding the denitrifying nitric oxide reductase. FixK2 is a CRP/FNR-type transcription factor, which recognizes a 14 bp-palindrome (FixK2 box) at the regulated promoters through three residues (L195, E196, and R200) within a C-terminal helix-turn-helix motif. Here, we mapped the determinants for discriminatory FixK2 -mediated regulation. While R200 was essential for DNA binding and activity of FixK2 , L195 was involved in protein-DNA complex stability. Mutation at positions 1, 3, or 11 in the genuine FixK2 box at the fixNOQP promoter impaired transcription activation by FixK2 , which was residual when a second mutation affecting the box palindromy was introduced. The substitution of nucleotide 11 within the NnrR box at the norCBQD promoter allowed FixK2 -mediated activation in response to microoxia. Thus, position 11 within the FixK2 /NnrR boxes constitutes a key element that changes FixK2 targets specificity, and consequently, it might modulate B. diazoefficiens lifestyle as nitrogen fixer or as denitrifier.
Collapse
Affiliation(s)
- Juan J Cabrera
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - Andrea Jiménez-Leiva
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - Laura Tomás-Gallardo
- Proteomics and Biochemistry Unit, Andalusian Centre for Developmental Biology, CSIC-Junta de Andalucía-Pablo de Olavide University, Seville, 41013, Spain
| | - Sergio Parejo
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - Sara Casado
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - María J Torres
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - Eulogio J Bedmar
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - María J Delgado
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| | - Socorro Mesa
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, CSIC, Granada, 18008, Spain
| |
Collapse
|
9
|
Meseguer A, Årman F, Fornes O, Molina-Fernández R, Bonet J, Fernandez-Fuentes N, Oliva B. On the prediction of DNA-binding preferences of C2H2-ZF domains using structural models: application on human CTCF. NAR Genom Bioinform 2021; 2:lqaa046. [PMID: 33575598 PMCID: PMC7671317 DOI: 10.1093/nargab/lqaa046] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 05/07/2020] [Accepted: 06/10/2020] [Indexed: 12/25/2022] Open
Abstract
Cis2-His2 zinc finger (C2H2-ZF) proteins are the largest family of transcription factors in human and higher metazoans. To date, the DNA-binding preferences of many members of this family remain unknown. We have developed a computational method to predict their DNA-binding preferences. We have computed theoretical position weight matrices (PWMs) of proteins composed by C2H2-ZF domains, with the only requirement of an input structure. We have predicted more than two-third of a single zinc-finger domain binding site for about 70% variants of Zif268, a classical member of this family. We have successfully matched between 60 and 90% of the binding-site motif of examples of proteins composed by three C2H2-ZF domains in JASPAR, a standard database of PWMs. The tests are used as a proof of the capacity to scan a DNA fragment and find the potential binding sites of transcription-factors formed by C2H2-ZF domains. As an example, we have tested the approach to predict the DNA-binding preferences of the human chromatin binding factor CTCF. We offer a server to model the structure of a zinc-finger protein and predict its PWM.
Collapse
Affiliation(s)
- Alberto Meseguer
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Filip Årman
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Ruben Molina-Fernández
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| | - Jaume Bonet
- Laboratory of Protein Design & Immunoengineering, School of Engineering, Ecole Polytechnique Federale de Lausanne, Lausanne 1015, Vaud, Switzerland
| | - Narcis Fernandez-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic, Catalonia 08500, Spain
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona, Catalonia 08005, Spain
| |
Collapse
|
10
|
Callejo M, Mondejar-Parreño G, Morales-Cano D, Barreira B, Esquivel-Ruiz S, Olivencia MA, Manaud G, Perros F, Duarte J, Moreno L, Cogolludo A, Perez-Vizcaíno F. Vitamin D deficiency downregulates TASK-1 channels and induces pulmonary vascular dysfunction. Am J Physiol Lung Cell Mol Physiol 2020; 319:L627-L640. [DOI: 10.1152/ajplung.00475.2019] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Vitamin D (VitD) receptor regulates the expression of several genes involved in signaling pathways affected in pulmonary hypertension (PH). VitD deficiency is highly prevalent in PH, and low levels are associated with poor prognosis. We investigated if VitD deficiency may predispose to or exacerbate PH. Male Wistar rats were fed with a standard or a VitD-free diet for 5 wk. Next, rats were further divided into controls or PH, which was induced by a single dose of Su-5416 (20 mg/kg) and exposure to hypoxia (10% O2) for 2 wk. VitD deficiency had no effect on pulmonary pressure in normoxic rats, indicating that, by itself, it does not trigger PH. However, it induced several moderate but significant changes characteristic of PH in the pulmonary arteries, such as increased muscularization, endothelial dysfunction, increased survivin, and reduced bone morphogenetic protein ( Bmp) 4, Bmp6, DNA damage-inducible transcript 4, and K+ two - pore domain channel subfamily K member 3 ( Kcnk3) expression. Myocytes isolated from pulmonary arteries from VitD-deficient rats had a reduced whole voltage-dependent potassium current density and acid-sensitive (TASK-like) potassium currents. In rats with PH induced by Su-5416 plus hypoxia, VitD-free diet induced a modest increase in pulmonary pressure, worsened endothelial function, increased the hyperreactivity to serotonin, arterial muscularization, decreased total and TASK-1 potassium currents, and further depolarized the pulmonary artery smooth muscle cell membrane. In human pulmonary artery smooth muscle cells from controls and patients with PH, the active form of VitD calcitriol significantly increased KCNK3 mRNA expression. Altogether, these data strongly suggest that the deficit in VitD induces pulmonary vascular dysfunction.
Collapse
Affiliation(s)
- Maria Callejo
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Gema Mondejar-Parreño
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Daniel Morales-Cano
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Bianca Barreira
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Sergio Esquivel-Ruiz
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Miguel Angel Olivencia
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Grégoire Manaud
- Université Paris–Saclay, AP-HP, INSERM UMR_S 999, Service de Pneumologie et Soins Intensifs Respiratoires, Hôpital de Bicêtre, Le Kremlin Bicêtre, France
| | - Frédéric Perros
- Université Paris–Saclay, AP-HP, INSERM UMR_S 999, Service de Pneumologie et Soins Intensifs Respiratoires, Hôpital de Bicêtre, Le Kremlin Bicêtre, France
| | - Juan Duarte
- Department of Pharmacology, School of Pharmacy, Universidad de Granada, Granada, Spain
- Ciber Enfermedades Cardiovasculares, Madrid, Spain
| | - Laura Moreno
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Angel Cogolludo
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| | - Francisco Perez-Vizcaíno
- Department of Pharmacology and Toxicology, School of Medicine, Universidad Complutense de Madrid, Madrid, Spain
- CIBER Enfermedades Respiratorias, Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain
| |
Collapse
|
11
|
Adam K, Gyorgypal Z, Hegedus Z. DNA Readout Viewer (DRV): visualization of specificity determining patterns of protein-binding DNA segments. Bioinformatics 2020; 36:2286-2287. [PMID: 31793988 PMCID: PMC7141859 DOI: 10.1093/bioinformatics/btz906] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 11/27/2019] [Accepted: 11/29/2019] [Indexed: 11/14/2022] Open
Abstract
Summary The sequence specific recognition of DNA by regulatory proteins typically occurs by establishing hydrogen bonds and non-bonded contacts between chemical sub-structures of nucleotides and amino acids forming the compatible interacting surfaces. The recognition process is also influenced by the physicochemical and conformational character of the target oligonucleotide motif. Although the role of these mechanisms in DNA-protein interactions is well-established, bioinformatical methods rarely address them directly, instead binding specificity is mostly assessed at nucleotide level. DNA Readout Viewer (DRV) aims to provide a novel DNA representation, facilitating in-depth view into these mechanisms by the concurrent visualization of functional groups and a diverse collection of DNA descriptors. By applying its intuitive representation concept for various DNA recognition related visualization tasks, DRV can contribute to unravelling the binding specificity factors of DNA-protein interactions. Availability and implementation DRV is freely available at https://drv.brc.hu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Krisztian Adam
- Institute of Biophysics, Biological Research Centre, H-6726 Szeged, Hungary
| | - Zoltan Gyorgypal
- Institute of Biophysics, Biological Research Centre, H-6726 Szeged, Hungary
| | - Zoltan Hegedus
- Institute of Biophysics, Biological Research Centre, H-6726 Szeged, Hungary.,Department of Biochemistry and Medical Chemistry, Medical School, University of Pécs, H-7622 Pécs, Hungary
| |
Collapse
|
12
|
Sagendorf JM, Markarian N, Berman HM, Rohs R. DNAproDB: an expanded database and web-based tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 2020; 48:D277-D287. [PMID: 31612957 PMCID: PMC7145614 DOI: 10.1093/nar/gkz889] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 09/22/2019] [Accepted: 10/01/2019] [Indexed: 11/24/2022] Open
Abstract
DNAproDB (https://dnaprodb.usc.edu) is a web-based database and structural analysis tool that offers a combination of data visualization, data processing and search functionality that improves the speed and ease with which researchers can analyze, access and visualize structural data of DNA–protein complexes. In this paper, we report significant improvements made to DNAproDB since its initial release. DNAproDB now supports any DNA secondary structure from typical B-form DNA to single-stranded DNA to G-quadruplexes. We have updated the structure of our data files to support complex DNA conformations, multiple DNA–protein complexes within a DNAproDB entry and model indexing for analysis of ensemble data. Support for chemically modified residues and nucleotides has been significantly improved along with the addition of new structural features, improved structural moiety assignment and use of more sequence-based annotations. We have redesigned our report pages and search forms to support these enhancements, and the DNAproDB website has been improved to be more responsive and user-friendly. DNAproDB is now integrated with the Nucleic Acid Database, and we have increased our coverage of available Protein Data Bank entries. Our database now contains 95% of all available DNA–protein complexes, making our tools for analysis of these structures accessible to a broad community.
Collapse
Affiliation(s)
- Jared M Sagendorf
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Nicholas Markarian
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.,Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Quantitative and Computational Biology, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
13
|
Santana-Garcia W, Rocha-Acevedo M, Ramirez-Navarro L, Mbouamboua Y, Thieffry D, Thomas-Chollier M, Contreras-Moreira B, van Helden J, Medina-Rivera A. RSAT variation-tools: An accessible and flexible framework to predict the impact of regulatory variants on transcription factor binding. Comput Struct Biotechnol J 2019; 17:1415-1428. [PMID: 31871587 PMCID: PMC6906655 DOI: 10.1016/j.csbj.2019.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 09/22/2019] [Accepted: 09/25/2019] [Indexed: 02/06/2023] Open
Abstract
Gene regulatory regions contain short and degenerated DNA binding sites recognized by transcription factors (TFBS). When TFBS harbor SNPs, the DNA binding site may be affected, thereby altering the transcriptional regulation of the target genes. Such regulatory SNPs have been implicated as causal variants in Genome-Wide Association Study (GWAS) studies. In this study, we describe improved versions of the programs Variation-tools designed to predict regulatory variants, and present four case studies to illustrate their usage and applications. In brief, Variation-tools facilitate i) obtaining variation information, ii) interconversion of variation file formats, iii) retrieval of sequences surrounding variants, and iv) calculating the change on predicted transcription factor affinity scores between alleles, using motif scanning approaches. Notably, the tools support the analysis of haplotypes. The tools are included within the well-maintained suite Regulatory Sequence Analysis Tools (RSAT, http://rsat.eu), and accessible through a web interface that currently enables analysis of five metazoa and ten plant genomes. Variation-tools can also be used in command-line with any locally-installed Ensembl genome. Users can input personal collections of variants and motifs, providing flexibility in the analysis.
Collapse
Key Words
- Binding motifs
- CEU, Northern Europeans from Utah
- CRM, Cis-Regulatory Module
- GWAS, Genome Wide Association Studies
- LD, Linkage Disequilibrium
- MPRA, Massively Parallel Reporter Assays: MPRA
- PSSM, Position Specific Scoring Matrix
- Position specific scoring matrix
- ROC, Receiver Operating Characteristic
- RSAT, Regulatory Sequence Analysis Tools
- Regulatory variants
- SNP, Single Nucleotide Polymorphism
- SNPs
- SOIs, SNPs of Interest
- TF, Transcription Factor
- TFBS, Transcription Factor Binding Site
- Transcription factors
- eQTL, Expression Quantitative Trait Loci
- rsID, Reference SNP Identifier
Collapse
Affiliation(s)
- Walter Santana-Garcia
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Maria Rocha-Acevedo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Lucia Ramirez-Navarro
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
| | - Yvon Mbouamboua
- Fondation Congolaise pour la Recherche Médicale, Brazzaville, People’s Republic of Congo
- Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
| | - Denis Thieffry
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Morgane Thomas-Chollier
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | | | - Jacques van Helden
- Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
- Corresponding authors at: Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México (Medina-Rivera). Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France (J. van Heldenf).
| | - Alejandra Medina-Rivera
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, Mexico
- Corresponding authors at: Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México (Medina-Rivera). Aix-Marseille Univ, INSERM UMR S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France (J. van Heldenf).
| |
Collapse
|
14
|
Mulugeta TD, Nome T, To TH, Gundappa MK, Macqueen DJ, Våge DI, Sandve SR, Hvidsten TR. SalMotifDB: a tool for analyzing putative transcription factor binding sites in salmonid genomes. BMC Genomics 2019; 20:694. [PMID: 31477007 PMCID: PMC6720087 DOI: 10.1186/s12864-019-6051-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 08/21/2019] [Indexed: 12/11/2022] Open
Abstract
Background Recently developed genome resources in Salmonid fish provides tools for studying the genomics underlying a wide range of properties including life history trait variation in the wild, economically important traits in aquaculture and the evolutionary consequences of whole genome duplications. Although genome assemblies now exist for a number of salmonid species, the lack of regulatory annotations are holding back our mechanistic understanding of how genetic variation in non-coding regulatory regions affect gene expression and the downstream phenotypic effects. Results We present SalMotifDB, a database and associated web and R interface for the analysis of transcription factors (TFs) and their cis-regulatory binding sites in five salmonid genomes. SalMotifDB integrates TF-binding site information for 3072 non-redundant DNA patterns (motifs) assembled from a large number of metazoan motif databases. Through motif matching and TF prediction, we have used these multi-species databases to construct putative regulatory networks in salmonid species. The utility of SalMotifDB is demonstrated by showing that key lipid metabolism regulators are predicted to regulate a set of genes affected by different lipid and fatty acid content in the feed, and by showing that our motif database explains a significant proportion of gene expression divergence in gene duplicates originating from the salmonid specific whole genome duplication. Conclusions SalMotifDB is an effective tool for analyzing transcription factors, their binding sites and the resulting gene regulatory networks in salmonid species, and will be an important tool for gaining a better mechanistic understanding of gene regulation and the associated phenotypes in salmonids. SalMotifDB is available at https://salmobase.org/apps/SalMotifDB. Electronic supplementary material The online version of this article (10.1186/s12864-019-6051-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Teshome Dagne Mulugeta
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Torfinn Nome
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Thu-Hien To
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Manu Kumar Gundappa
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - Daniel J Macqueen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - Dag Inge Våge
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Simen Rød Sandve
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Torgeir R Hvidsten
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway.
| |
Collapse
|
15
|
Kanofsky K, Riggers J, Staar M, Strauch CJ, Arndt LC, Hehl R. A strong NF-κB p65 responsive cis-regulatory sequence from Arabidopsis thaliana interacts with WRKY40. PLANT CELL REPORTS 2019; 38:1139-1150. [PMID: 31197450 DOI: 10.1007/s00299-019-02433-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 05/28/2019] [Indexed: 06/09/2023]
Abstract
Transcription factors from mammals and plants, which play a role in innate immunity, interact with the same microbe-associated molecular pattern (MAMP)-responsive sequences from Arabidopsis thaliana. The interaction of mouse NF-κB p65 with MAMP-responsive sequences containing the core motif GACTTT of the WT-box was investigated. This revealed one sequence, derived from the promoter of the A. thaliana gene At1g76960, a gene with unknown function, to activate NF-κB p65 dependent reporter gene expression in plant cells very strongly. A bioinformatic analysis predicts three putative NF-κB p65 binding sites in this sequence and all three sites are required for reporter gene activation and binding. The sequence is one of the weakest MAMP-responsive sequences previously isolated, but the introduction of a GCC-box increases its MAMP responsivity in combination with upstream WT-box sequences. Although a bioinformatic analysis of the unmutated cis-sequence only predicts NF-κB p65 binding, A. thaliana WRKY40 was selected in a yeast one-hybrid screen. WRKY40, which is a transcriptional repressor, requires the sequence TTTTCTA for direct binding. This sequence is similar to the WK-box TTTTCCAC, previously shown to interact with tobacco NtWRKY12. In summary, this work supports the similarity in binding site recognition between NF-κB and WRKY factors.
Collapse
Affiliation(s)
- Konstantin Kanofsky
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Jasmin Riggers
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Marcel Staar
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Claudia Janina Strauch
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Laureen Christin Arndt
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Reinhard Hehl
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany.
| |
Collapse
|
16
|
How B-DNA Dynamics Decipher Sequence-Selective Protein Recognition. J Mol Biol 2019; 431:3845-3859. [DOI: 10.1016/j.jmb.2019.07.021] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 07/09/2019] [Accepted: 07/10/2019] [Indexed: 11/23/2022]
|
17
|
Seto J. On a Robust, Sensitive Cell-Free Method for Pseudomonas Sensing and Quantification in Microfluidic Templated Hydrogels. MICROMACHINES 2019; 10:E506. [PMID: 31370199 PMCID: PMC6723077 DOI: 10.3390/mi10080506] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 07/02/2019] [Accepted: 07/29/2019] [Indexed: 12/19/2022]
Abstract
Through the use of droplet microfluidics to integrate cell-free activity into inert hydrogel beads, we have developed a platform that can perform biologically relevant functions without the need for cells. Specifically, cell-free lysates serve a utility in performing cellular functions and providing biologically relevant metabolic products without requiring the optimal biological conditions for cell growth and proliferation. By teasing out specific biological components that enable transcription and translation to occur, these cell-like functions can be reconstituted in vitro without requiring the entire cell and milieu of cellular organelles. This enables the optimization of synthetic biological circuits, either by concentration or logic switches, simply through the addition or removal of genetic components (plasmids, inducers, or repressors) of regulatory elements. Here, we demonstrate an application of cell-free processes that is robust and portable, independent of a substrate, to apply for sensing and reporting functions of a quorum-sensing molecule N-3-oxododecanoyl homoserine lactone (3OC12HSL) found crucial for pathological Pseudomonas aeruginosa infection. We develop an agarose bead platform that is easily adaptable and simply programmable to fit a variety of biological and chemical sensing applications for the utility of ease of delivery and activation in remote environments-even in conditions with very little hydration.
Collapse
Affiliation(s)
- Jong Seto
- Department of Bioengineering and Therapeutic Sciences, University of California at San Francisco and California, Institute for Quantitative Biosciences (QB3), 1700 4th Street, Byers Hall #303, San Francisco, CA 94158, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| |
Collapse
|
18
|
Sagendorf JM, Berman HM, Rohs R. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 2019; 45:W89-W97. [PMID: 28431131 PMCID: PMC5570235 DOI: 10.1093/nar/gkx272] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 04/06/2017] [Indexed: 02/06/2023] Open
Abstract
Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu.
Collapse
Affiliation(s)
- Jared M Sagendorf
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Helen M Berman
- RCSB Protein Data Bank, Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics & Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
19
|
Emamjomeh A, Choobineh D, Hajieghrari B, MahdiNezhad N, Khodavirdipour A. DNA-protein interaction: identification, prediction and data analysis. Mol Biol Rep 2019; 46:3571-3596. [PMID: 30915687 DOI: 10.1007/s11033-019-04763-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 03/14/2019] [Indexed: 12/30/2022]
Abstract
Life in living organisms is dependent on specific and purposeful interaction between other molecules. Such purposeful interactions make the various processes inside the cells and the bodies of living organisms possible. DNA-protein interactions, among all the types of interactions between different molecules, are of considerable importance. Currently, with the development of numerous experimental techniques, diverse methods are convenient for recognition and investigating such interactions. While the traditional experimental techniques to identify DNA-protein complexes are time-consuming and are unsuitable for genome-scale studies, the current high throughput approaches are more efficient in determining such interaction at a large-scale, but they are clearly too costly to be practice for daily applications. Hence, according to the availability of much information related to different biological sequences and clearing different dimensions of conditions in which such interactions are formed, with the developments related to the computer, mathematics, and statistics motivate scientists to develop bioinformatics tools for prediction the interaction site(s). Until now, there has been much progress in this field. In this review, the factors and conditions governing the interaction and the laboratory techniques for examining such interactions are addressed. In addition, developed bioinformatics tools are introduced and compared for this reason and, in the end, several suggestions are offered for the promotion of such tools in prediction with much more precision.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran.
| | - Darush Choobineh
- Agricultural Biotechnology, Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Behzad Hajieghrari
- Department of Agricultural Biotechnology, College of Agriculture, Jahrom University, Jahrom, 74135-111, Iran.
| | - Nafiseh MahdiNezhad
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, 98615-538, Iran
| | - Amir Khodavirdipour
- Division of Human Genetics, Department of Anatomy, St. John's hospital, Bangalore, India
| |
Collapse
|
20
|
Kanofsky K, Strauch CJ, Sandmann A, Möller A, Hehl R. Transcription factors involved in basal immunity in mammals and plants interact with the same MAMP-responsive cis-sequence from Arabidopsis thaliana. PLANT MOLECULAR BIOLOGY 2018; 98:565-578. [PMID: 30467788 DOI: 10.1007/s11103-018-0796-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 11/15/2018] [Indexed: 06/09/2023]
Abstract
WRKY and NF-κB transcription factors, involved in innate immunity in plants and mammals, interact with the same cis-sequence. Novel microbe-associated molecular pattern (MAMP)-responsive cis-sequences, designated type II WT-boxes, are required for flg22-responsive gene expression in Arabidopsis thaliana protoplasts. While type I WT-boxes like TGACTTTT and CGACTTTT interact with WRKY transcription factors (TFs), the question remained which TFs bind to the type II WT-boxes GGACTTTC, GGACTTTT, and GGACTTTG. Surprisingly, a bioinformatic analysis predicts mouse (Mus musculus) NF-κB p65 as a TF interacting with type II WT-boxes. NF-κB p65, like WRKY factors in plants, plays a role in innate immunity in mammals. Therefore, the interaction of NF-κB p65 with type II WT-boxes was tested experimentally. NF-κB p65 requires the WT-boxes GGACTTTC, GGACTTTT, and GGACTTTG for activating reporter gene expression in plant cells. NF-κB p65 directly binds to WT-box containing synthetic promoters in vitro and requires the WT-box for binding. Earlier studies indicate that the sequence GGACTTTC is also required for WRKY26 mediated reporter gene activation. Here it is shown that WRKY26, like NF-κB p65, binds to the sequence GGACTTTC. Consistent with other recent studies, type II WT boxes are WRKY binding sites and the distinction between type I and type II no longer applies.
Collapse
Affiliation(s)
- Konstantin Kanofsky
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Claudia Janina Strauch
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Alexander Sandmann
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Anika Möller
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany
| | - Reinhard Hehl
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr. 7, 38106, Braunschweig, Germany.
| |
Collapse
|
21
|
Majewska M, Wysokińska H, Kuźma Ł, Szymczyk P. Eukaryotic and prokaryotic promoter databases as valuable tools in exploring the regulation of gene transcription: a comprehensive overview. Gene 2017; 644:38-48. [PMID: 29104165 DOI: 10.1016/j.gene.2017.10.079] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 07/26/2017] [Accepted: 10/27/2017] [Indexed: 01/02/2023]
Abstract
The complete exploration of the regulation of gene expression remains one of the top-priority goals for researchers. As the regulation is mainly controlled at the level of transcription by promoters, study on promoters and findings are of great importance. This review summarizes forty selected databases that centralize experimental and theoretical knowledge regarding the organization of promoters, interacting transcription factors (TFs) and microRNAs (miRNAs) in many eukaryotic and prokaryotic species. The presented databases offer researchers valuable support in elucidating the regulation of gene transcription.
Collapse
Affiliation(s)
- Małgorzata Majewska
- Department of Biology and Pharmaceutical Botany, Medical University of Lodz, 90-151 Lodz, Poland.
| | - Halina Wysokińska
- Department of Biology and Pharmaceutical Botany, Medical University of Lodz, 90-151 Lodz, Poland
| | - Łukasz Kuźma
- Department of Biology and Pharmaceutical Botany, Medical University of Lodz, 90-151 Lodz, Poland
| | - Piotr Szymczyk
- Department of Pharmaceutical Biotechnology, Medical University of Lodz, 90-151 Lodz, Poland
| |
Collapse
|
22
|
Wilson KA, Wetmore SD. Combining crystallographic and quantum chemical data to understand DNA-protein π-interactions in nature. Struct Chem 2017. [DOI: 10.1007/s11224-017-0954-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
23
|
Korostelev YD, Zharov IA, Mironov AA, Rakhmaininova AB, Gelfand MS. Identification of Position-Specific Correlations between DNA-Binding Domains and Their Binding Sites. Application to the MerR Family of Transcription Factors. PLoS One 2016; 11:e0162681. [PMID: 27690309 PMCID: PMC5045206 DOI: 10.1371/journal.pone.0162681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 08/26/2016] [Indexed: 11/25/2022] Open
Abstract
The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.
Collapse
Affiliation(s)
- Yuriy D. Korostelev
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Ilya A. Zharov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Andrey A. Mironov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Alexandra B. Rakhmaininova
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Mikhail S. Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
- * E-mail:
| |
Collapse
|
24
|
Jakubec D, Laskowski RA, Vondrasek J. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis. PLoS One 2016; 11:e0158704. [PMID: 27384774 PMCID: PMC4934765 DOI: 10.1371/journal.pone.0158704] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 06/21/2016] [Indexed: 12/24/2022] Open
Abstract
Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties.
Collapse
Affiliation(s)
- David Jakubec
- Institute of Organic Chemistry and Biochemistry, Prague 6, Czech Republic
- Department of Physical and Macromolecular Chemistry, Faculty of Science, Charles University in Prague, Prague 2, Czech Republic
| | - Roman A. Laskowski
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Jiri Vondrasek
- Institute of Organic Chemistry and Biochemistry, Prague 6, Czech Republic
- * E-mail:
| |
Collapse
|
25
|
FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces. Methods Mol Biol 2016; 1482:259-77. [PMID: 27557773 DOI: 10.1007/978-1-4939-6396-6_17] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
FootprintDB is a database and search engine that compiles regulatory sequences from open access libraries of curated DNA cis-elements and motifs, and their associated transcription factors (TFs). It systematically annotates the binding interfaces of the TFs by exploiting protein-DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interface residues involved in specific recognition. This chapter explains step-by-step how to search for DNA motifs and protein sequences in footprintDB and how to focus the search to a particular organism. Two real-world examples are shown where this software was used to analyze transcriptional regulation in plants. Results are described with the aim of guiding users on their interpretation, and special attention is given to the choices users might face when performing similar analyses.
Collapse
|
26
|
Kibet CK, Machanick P. Transcription factor motif quality assessment requires systematic comparative analysis. F1000Res 2015; 4:ISCB Comm J-1429. [PMID: 27092243 PMCID: PMC4821295 DOI: 10.12688/f1000research.7408.2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/29/2016] [Indexed: 11/22/2022] Open
Abstract
Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
Collapse
Affiliation(s)
- Caleb Kipkurui Kibet
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| | - Philip Machanick
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| |
Collapse
|
27
|
Kibet CK, Machanick P. Transcription factor motif quality assessment requires systematic comparative analysis. F1000Res 2015; 4:ISCB Comm J-1429. [PMID: 27092243 DOI: 10.12688/f1000research.7408.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/19/2015] [Indexed: 03/26/2024] Open
Abstract
Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. Finally, we demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis.
Collapse
Affiliation(s)
- Caleb Kipkurui Kibet
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| | - Philip Machanick
- Department of Computer Science and Research Unit in Bioinformatics (RUBi), Rhodes University, Grahamstown, South Africa
| |
Collapse
|
28
|
AlQuraishi M, Tang S, Xia X. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system. BMC Bioinformatics 2015; 16:390. [PMID: 26586237 PMCID: PMC4653904 DOI: 10.1186/s12859-015-0819-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 11/11/2015] [Indexed: 11/28/2022] Open
Abstract
Background Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. Description We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Conclusions This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA. .,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA.
| | - Shengdong Tang
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA
| | - Xide Xia
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA
| |
Collapse
|
29
|
Analysis of the DNA-Binding Activities of the Arabidopsis R2R3-MYB Transcription Factor Family by One-Hybrid Experiments in Yeast. PLoS One 2015; 10:e0141044. [PMID: 26484765 PMCID: PMC4613820 DOI: 10.1371/journal.pone.0141044] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/02/2015] [Indexed: 12/20/2022] Open
Abstract
The control of growth and development of all living organisms is a complex and dynamic process that requires the harmonious expression of numerous genes. Gene expression is mainly controlled by the activity of sequence-specific DNA binding proteins called transcription factors (TFs). Amongst the various classes of eukaryotic TFs, the MYB superfamily is one of the largest and most diverse, and it has considerably expanded in the plant kingdom. R2R3-MYBs have been extensively studied over the last 15 years. However, DNA-binding specificity has been characterized for only a small subset of these proteins. Therefore, one of the remaining challenges is the exhaustive characterization of the DNA-binding specificity of all R2R3-MYB proteins. In this study, we have developed a library of Arabidopsis thaliana R2R3-MYB open reading frames, whose DNA-binding activities were assayed in vivo (yeast one-hybrid experiments) with a pool of selected cis-regulatory elements. Altogether 1904 interactions were assayed leading to the discovery of specific patterns of interactions between the various R2R3-MYB subgroups and their DNA target sequences and to the identification of key features that govern these interactions. The present work provides a comprehensive in vivo analysis of R2R3-MYB binding activities that should help in predicting new DNA motifs and identifying new putative target genes for each member of this very large family of TFs. In a broader perspective, the generated data will help to better understand how TF interact with their target DNA sequences.
Collapse
|
30
|
Yang J, Ramsey SA. A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites. Bioinformatics 2015; 31:3445-50. [PMID: 26130577 DOI: 10.1093/bioinformatics/btv391] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 06/24/2015] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites. RESULTS We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs. AVAILABILITY AND IMPLEMENTATION The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/. CONTACT stephen.ramsey@oregonstate.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Stephen A Ramsey
- Department of Biomedical Sciences and School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
31
|
An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 2015; 16:5194-215. [PMID: 25756377 PMCID: PMC4394471 DOI: 10.3390/ijms16035194] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 02/21/2015] [Accepted: 02/27/2015] [Indexed: 02/06/2023] Open
Abstract
Interactions between proteins and DNA play an important role in many essential biological processes such as DNA replication, transcription, splicing, and repair. The identification of amino acid residues involved in DNA-binding sites is critical for understanding the mechanism of these biological activities. In the last decade, numerous computational approaches have been developed to predict protein DNA-binding sites based on protein sequence and/or structural information, which play an important role in complementing experimental strategies. At this time, approaches can be divided into three categories: sequence-based DNA-binding site prediction, structure-based DNA-binding site prediction, and homology modeling and threading. In this article, we review existing research on computational methods to predict protein DNA-binding sites, which includes data sets, various residue sequence/structural features, machine learning methods for comparison and selection, evaluation methods, performance comparison of different tools, and future directions in protein DNA-binding site prediction. In particular, we detail the meta-analysis of protein DNA-binding sites. We also propose specific implications that are likely to result in novel prediction methods, increased performance, or practical applications.
Collapse
|
32
|
Wilson KA, Wetmore SD. A Survey of DNA–Protein π–Interactions: A Comparison of Natural Occurrences and Structures, and Computationally Predicted Structures and Strengths. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2015. [DOI: 10.1007/978-3-319-14163-3_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
33
|
Park B, Kim H, Han K. DBBP: database of binding pairs in protein-nucleic acid interactions. BMC Bioinformatics 2014; 15 Suppl 15:S5. [PMID: 25474259 PMCID: PMC4271565 DOI: 10.1186/1471-2105-15-s15-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Background Interaction of proteins with other molecules plays an important role in many biological activities. As many structures of protein-DNA complexes and protein-RNA complexes have been determined in the past years, several databases have been constructed to provide structure data of the complexes. However, the information on the binding sites between proteins and nucleic acids is not readily available from the structure data since the data consists mostly of the three-dimensional coordinates of the atoms in the complexes. Results We analyzed the huge amount of structure data for the hydrogen bonding interactions between proteins and nucleic acids and developed a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions, http://bclab.inha.ac.kr/dbbp). DBBP contains 44,955 hydrogen bonds (H-bonds) of protein-DNA interactions and 77,947 H-bonds of protein-RNA interactions. Conclusions Analysis of the huge amount of structure data of protein-nucleic acid complexes is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. DBBP provides the detailed information of hydrogen-bonding interactions between proteins and nucleic acids at various levels from the atomic level to the residue level. The binding information can be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids.
Collapse
|
34
|
Joyce AP, Zhang C, Bradley P, Havranek JJ. Structure-based modeling of protein: DNA specificity. Brief Funct Genomics 2014; 14:39-49. [PMID: 25414269 DOI: 10.1093/bfgp/elu044] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein:DNA interactions are essential to a range of processes that maintain and express the information encoded in the genome. Structural modeling is an approach that aims to understand these interactions at the physicochemical level. It has been proposed that structural modeling can lead to deeper understanding of the mechanisms of protein:DNA interactions, and that progress in this field can not only help to rationalize the observed specificities of DNA-binding proteins but also to allow researchers to engineer novel DNA site specificities. In this review we discuss recent developments in the structural description of protein:DNA interactions and specificity, as well as the challenges facing the field in the future.
Collapse
|
35
|
Dubos C, Kelemen Z, Sebastian A, Bülow L, Huep G, Xu W, Grain D, Salsac F, Brousse C, Lepiniec L, Weisshaar B, Contreras-Moreira B, Hehl R. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes. BMC Genomics 2014; 15:317. [PMID: 24773781 PMCID: PMC4234446 DOI: 10.1186/1471-2164-15-317] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 04/16/2014] [Indexed: 11/22/2022] Open
Abstract
Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Reinhard Hehl
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr, 7, 38106 Braunschweig, Germany.
| |
Collapse
|
36
|
Sebastian A, Contreras-Moreira B. footprintDB: a database of transcription factors with annotated cis elements and binding interfaces. ACTA ACUST UNITED AC 2013; 30:258-65. [PMID: 24234003 DOI: 10.1093/bioinformatics/btt663] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. RESULTS FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. AVAILABILITY AND IMPLEMENTATION Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
Collapse
Affiliation(s)
- Alvaro Sebastian
- Laboratory of Computational Biology, Department of Genetics and Plant Production, Estación Experimental de Aula Dei/CSIC, Av. Montañana 1005, Zaragoza (http://www.eead.csic.es/compbio) and Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain
| | | |
Collapse
|
37
|
Milhinhos A, Prestele J, Bollhöner B, Matos A, Vera-Sirera F, Rambla JL, Ljung K, Carbonell J, Blázquez MA, Tuominen H, Miguel CM. Thermospermine levels are controlled by an auxin-dependent feedback loop mechanism in Populus xylem. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2013; 75:685-98. [PMID: 23647338 DOI: 10.1111/tpj.12231] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 04/29/2013] [Accepted: 05/01/2013] [Indexed: 05/03/2023]
Abstract
Polyamines are small polycationic amines that are widespread in living organisms. Thermospermine, synthesized by thermospermine synthase ACAULIS5 (ACL5), was recently shown to be an endogenous plant polyamine. Thermospermine is critical for proper vascular development and xylem cell specification, but it is not known how thermospermine homeostasis is controlled in the xylem. We present data in the Populus model system supporting the existence of a negative feedback control of thermospermine levels in stem xylem tissues, the main site of thermospermine biosynthesis. While over-expression of the ACL5 homologue in Populus, POPACAULIS5, resulted in strong up-regulation of ACL5 expression and thermospermine accumulation in leaves, the corresponding levels in the secondary xylem tissues of the stem were similar or lower than those in the wild-type. POPACAULIS5 over-expression had a negative effect on accumulation of indole-3-acetic acid, while exogenous auxin had a positive effect on POPACAULIS5 expression, thus promoting thermospermine accumulation. Further, over-expression of POPACAULIS5 negatively affected expression of the class III homeodomain leucine zipper (HD-Zip III) transcription factor gene PttHB8, a homologue of AtHB8, while up-regulation of PttHB8 positively affected POPACAULIS5 expression. These results indicate that excessive accumulation of thermospermine is prevented by a negative feedback control of POPACAULIS5 transcript levels through suppression of indole-3-acetic acid levels, and that PttHB8 is involved in the control of POPACAULIS5 expression. We propose that this negative feedback loop functions to maintain steady-state levels of thermospermine, which is required for proper xylem development, and that it is dependent on the presence of high concentrations of endogenous indole-3-acetic acid, such as those present in the secondary xylem tissues.
Collapse
Affiliation(s)
- Ana Milhinhos
- Instituto de Biologia Experimental e Tecnológica, Apartado 12, 2781-901, Oeiras, Portugal
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Serra TS, Figueiredo DD, Cordeiro AM, Almeida DM, Lourenço T, Abreu IA, Sebastián A, Fernandes L, Contreras-Moreira B, Oliveira MM, Saibo NJM. OsRMC, a negative regulator of salt stress response in rice, is regulated by two AP2/ERF transcription factors. PLANT MOLECULAR BIOLOGY 2013; 82:439-55. [PMID: 23703395 DOI: 10.1007/s11103-013-0073-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Accepted: 05/13/2013] [Indexed: 05/03/2023]
Abstract
High salinity causes remarkable losses in rice productivity worldwide mainly because it inhibits growth and reduces grain yield. To cope with environmental changes, plants evolved several adaptive mechanisms, which involve the regulation of many stress-responsive genes. Among these, we have chosen OsRMC to study its transcriptional regulation in rice seedlings subjected to high salinity. Its transcription was highly induced by salt treatment and showed a stress-dose-dependent pattern. OsRMC encodes a receptor-like kinase described as a negative regulator of salt stress responses in rice. To investigate how OsRMC is regulated in response to high salinity, a salt-induced rice cDNA expression library was constructed and subsequently screened using the yeast one-hybrid system and the OsRMC promoter as bait. Thereby, two transcription factors (TFs), OsEREBP1 and OsEREBP2, belonging to the AP2/ERF family were identified. Both TFs were shown to bind to the same GCC-like DNA motif in OsRMC promoter and to negatively regulate its gene expression. The identified TFs were characterized regarding their gene expression under different abiotic stress conditions. This study revealed that OsEREBP1 transcript level is not significantly affected by salt, ABA or severe cold (5 °C) and is only slightly regulated by drought and moderate cold. On the other hand, the OsEREBP2 transcript level increased after cold, ABA, drought and high salinity treatments, indicating that OsEREBP2 may play a central role mediating the response to different abiotic stresses. Gene expression analysis in rice varieties with contrasting salt tolerance further suggests that OsEREBP2 is involved in salt stress response in rice.
Collapse
Affiliation(s)
- Tânia S Serra
- Genomics of Plant Stress Laboratory, Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Oeiras, Portugal
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Abstract
Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.
Collapse
Affiliation(s)
- Chih-Kang Lin
- Center for Systems Biology, National Taiwan University, Taipei 106, Taiwan
| | | |
Collapse
|
40
|
Abstract
The 3DNA software package is a popular and versatile bioinformatics tool with capabilities to analyze, construct, and visualize three-dimensional nucleic acid structures. This article presents detailed protocols for a subset of new and popular features available in 3DNA, applicable to both individual structures and ensembles of related structures. Protocol 1 lists the set of instructions needed to download and install the software. This is followed, in Protocol 2, by the analysis of a nucleic acid structure, including the assignment of base pairs and the determination of rigid-body parameters that describe the structure and, in Protocol 3, by a description of the reconstruction of an atomic model of a structure from its rigid-body parameters. The most recent version of 3DNA, version 2.1, has new features for the analysis and manipulation of ensembles of structures, such as those deduced from nuclear magnetic resonance (NMR) measurements and molecular dynamic (MD) simulations; these features are presented in Protocols 4 and 5. In addition to the 3DNA stand-alone software package, the w3DNA web server, located at http://w3dna.rutgers.edu, provides a user-friendly interface to selected features of the software. Protocol 6 demonstrates a novel feature of the site for building models of long DNA molecules decorated with bound proteins at user-specified locations.
Collapse
Affiliation(s)
- Andrew V Colasanti
- Department of Chemistry & Chemical Biology and BioMaPS Institute for Quantitative Biology, Rutgers - The State University of New Jersey.
| | | | | |
Collapse
|
41
|
Gromiha MM, Nagarajan R. Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein-DNA complexes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 91:65-99. [PMID: 23790211 DOI: 10.1016/b978-0-12-411637-5.00003-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein-DNA recognition plays an important role in the regulation of gene expression. Understanding the influence of specific residues for protein-DNA interactions and the recognition mechanism of protein-DNA complexes is a challenging task in molecular and computational biology. Several computational approaches have been put forward to tackle these problems from different perspectives: (i) development of databases for the interactions between protein and DNA and binding specificity of protein-DNA complexes, (ii) structural analysis of protein-DNA complexes, (iii) discriminating DNA-binding proteins from amino acid sequence, (iv) prediction of DNA-binding sites and protein-DNA binding specificity using sequence and/or structural information, and (v) understanding the recognition mechanism of protein-DNA complexes. In this review, we focus on all these issues and extensively discuss the advancements on the development of comprehensive bioinformatics databases for protein-DNA interactions, efficient tools for identifying the binding sites, and plausible mechanisms for understanding the recognition of protein-DNA complexes. Further, the available online resources for understanding protein-DNA interactions are collectively listed, which will serve as ready-to-use information for the research community.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, Tamil Nadu, India.
| | | |
Collapse
|
42
|
Abstract
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
Collapse
Affiliation(s)
- Alvaro Sebastian
- Laboratory of Computational Biology, Department of Genetics and Plant Breeding, Estación Experimental de Aula Dei/CSIC, Av. Montañana, Spain.
| | | |
Collapse
|
43
|
Kirsanov DD, Zanegina ON, Aksianov EA, Spirin SA, Karyagina AS, Alexeevski AV. NPIDB: Nucleic acid-Protein Interaction DataBase. Nucleic Acids Res 2012. [PMID: 23193292 PMCID: PMC3531207 DOI: 10.1093/nar/gks1199] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Nucleic acid-Protein Interaction DataBase (http://npidb.belozersky.msu.ru/) contains information derived from structures of DNA-protein and RNA-protein complexes extracted from the Protein Data Bank (3846 complexes in October 2012). It provides a web interface and a set of tools for extracting biologically meaningful characteristics of nucleoprotein complexes. The content of the database is updated weekly. The current version of the Nucleic acid-Protein Interaction DataBase is an upgrade of the version published in 2007. The improvements include a new web interface, new tools for calculation of intermolecular interactions, a classification of SCOP families that contains DNA-binding protein domains and data on conserved water molecules on the DNA-protein interface.
Collapse
Affiliation(s)
- Dmitry D Kirsanov
- Department of Mathematical Methods in Biology, Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | | | |
Collapse
|
44
|
Xu D. Protein databases on the internet. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2012; Chapter 2:2.6.1-2.6.17. [PMID: 23151744 DOI: 10.1002/0471140864.ps0206s70] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein databases have become a crucial part of modern biology. Huge amounts of data for protein structures, functions, and particularly sequences are being generated. Searching databases is often the first step in the study of a new protein. Comparison between proteins or between protein families provides information about the relationship between proteins within a genome or across different species, and hence offers much more information than can be obtained by studying only an isolated protein. In addition, secondary databases derived from experimental databases are also widely available. These databases reorganize and annotate the data or provide predictions. The use of multiple databases often helps researchers understand the structure and function of a protein. Although some protein databases are widely known, they are far from being fully utilized in the protein science community. This unit provides a starting point for readers to explore the potential of protein databases on the Internet.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri
| |
Collapse
|
45
|
Turner D, Kim R, Guo JT. TFinDit: transcription factor-DNA interaction data depository. BMC Bioinformatics 2012; 13:220. [PMID: 22943312 PMCID: PMC3483241 DOI: 10.1186/1471-2105-13-220] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 08/23/2012] [Indexed: 11/28/2022] Open
Abstract
Background One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions. Description TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria. Conclusions TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
Collapse
Affiliation(s)
- Daniel Turner
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | |
Collapse
|
46
|
Liu LA, Bradley P. Atomistic modeling of protein-DNA interaction specificity: progress and applications. Curr Opin Struct Biol 2012; 22:397-405. [PMID: 22796087 DOI: 10.1016/j.sbi.2012.06.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 06/20/2012] [Indexed: 12/22/2022]
Abstract
An accurate, predictive understanding of protein-DNA binding specificity is crucial for the successful design and engineering of novel protein-DNA binding complexes. In this review, we summarize recent studies that use atomistic representations of interfaces to predict protein-DNA binding specificity computationally. Although methods with limited structural flexibility have proven successful at recapitulating consensus binding sequences from wild-type complex structures, conformational flexibility is likely important for design and template-based modeling, where non-native conformations need to be sampled and accurately scored. A successful application of such computational modeling techniques in the construction of the TAL-DNA complex structure is discussed. With continued improvements in energy functions, solvation models, and conformational sampling, we are optimistic that reliable and large-scale protein-DNA binding prediction and engineering is a goal within reach.
Collapse
|
47
|
Gabdoulline R, Eckweiler D, Kel A, Stegmaier P. 3DTF: a web server for predicting transcription factor PWMs using 3D structure-based energy calculations. Nucleic Acids Res 2012; 40:W180-5. [PMID: 22693215 PMCID: PMC3394331 DOI: 10.1093/nar/gks551] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
We present the webserver 3D transcription factor (3DTF) to compute position-specific weight matrices (PWMs) of transcription factors using a knowledge-based statistical potential derived from crystallographic data on protein–DNA complexes. Analysis of available structures that can be used to construct PWMs shows that there are hundreds of 3D structures from which PWMs could be derived, as well as thousands of proteins homologous to these. Therefore, we created 3DTF, which delivers binding matrices given the experimental or modeled protein–DNA complex. The webserver can be used by biologists to derive novel PWMs for transcription factors lacking known binding sites and is freely accessible at http://www.gene-regulation.com/pub/programs/3dtf/.
Collapse
Affiliation(s)
- R Gabdoulline
- Heinrich-Heine University of Duesseldorf, Universitaetstr. 1, 40225 Duesseldorf, Germany
| | | | | | | |
Collapse
|
48
|
Nadzirin N, Gardiner EJ, Willett P, Artymiuk PJ, Firdaus-Raih M. SPRITE and ASSAM: web servers for side chain 3D-motif searching in protein structures. Nucleic Acids Res 2012; 40:W380-6. [PMID: 22573174 PMCID: PMC3394286 DOI: 10.1093/nar/gks401] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Similarities in the 3D patterns of amino acid side chains can provide insights into their function despite the absence of any detectable sequence or fold similarities. Search for protein sites (SPRITE) and amino acid pattern search for substructures and motifs (ASSAM) are graph theoretical programs that can search for 3D amino side chain matches in protein structures, by representing the amino acid side chains as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. Both programs require the input file to be in the PDB format. The objective of using SPRITE is to identify matches of side chains in a query structure to patterns with characterized function. In contrast, a 3D pattern of interest can be searched for existing occurrences in available PDB structures using ASSAM. Both programs are freely accessible without any login requirement. SPRITE is available at http://mfrlab.org/grafss/sprite/ while ASSAM can be accessed at http://mfrlab.org/grafss/assam/.
Collapse
Affiliation(s)
- Nurul Nadzirin
- School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | | | | | | | | |
Collapse
|
49
|
Affiliation(s)
- Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri Columbia Missouri
| |
Collapse
|
50
|
Benchmarks for flexible and rigid transcription factor-DNA docking. BMC STRUCTURAL BIOLOGY 2011; 11:45. [PMID: 22044637 PMCID: PMC3262759 DOI: 10.1186/1472-6807-11-45] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2011] [Accepted: 11/01/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Structural insight from transcription factor-DNA (TF-DNA) complexes is of paramount importance to our understanding of the affinity and specificity of TF-DNA interaction, and to the development of structure-based prediction of TF binding sites. Yet the majority of the TF-DNA complexes remain unsolved despite the considerable experimental efforts being made. Computational docking represents a promising alternative to bridge the gap. To facilitate the study of TF-DNA docking, carefully designed benchmarks are needed for performance evaluation and identification of the strengths and weaknesses of docking algorithms. RESULTS We constructed two benchmarks for flexible and rigid TF-DNA docking respectively using a unified non-redundant set of 38 test cases. The test cases encompass diverse fold families and are classified into easy and hard groups with respect to the degrees of difficulty in TF-DNA docking. The major parameters used to classify expected docking difficulty in flexible docking are the conformational differences between bound and unbound TFs and the interaction strength between TFs and DNA. For rigid docking in which the starting structure is a bound TF conformation, only interaction strength is considered. CONCLUSIONS We believe these benchmarks are important for the development of better interaction potentials and TF-DNA docking algorithms, which bears important implications to structure-based prediction of transcription factor binding sites and drug design.
Collapse
|