1
|
Su Y, Yu G, Li D, Lu Y, Ren C, Xu Y, Yang Y, Zhang K, Ma T, Li Z. Identification of mitophagy-related biomarkers in human osteoporosis based on a machine learning model. Front Physiol 2024; 14:1289976. [PMID: 38260098 PMCID: PMC10800828 DOI: 10.3389/fphys.2023.1289976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 12/21/2023] [Indexed: 01/24/2024] Open
Abstract
Background: Osteoporosis (OP) is a chronic bone metabolic disease and a serious global public health problem. Several studies have shown that mitophagy plays an important role in bone metabolism disorders; however, its role in osteoporosis remains unclear. Methods: The Gene Expression Omnibus (GEO) database was used to download GSE56815, a dataset containing low and high BMD, and differentially expressed genes (DEGs) were analyzed. Mitochondrial autophagy-related genes (MRG) were downloaded from the existing literature, and highly correlated MRG were screened by bioinformatics methods. The results from both were taken as differentially expressed (DE)-MRG, and Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were performed. Protein-protein interaction network (PPI) analysis, support vector machine recursive feature elimination (SVM-RFE), and Boruta method were used to identify DE-MRG. A receiver operating characteristic curve (ROC) was drawn, a nomogram model was constructed to determine its diagnostic value, and a variety of bioinformatics methods were used to verify the relationship between these related genes and OP, including GO and KEGG analysis, IP pathway analysis, and single-sample Gene Set Enrichment Analysis (ssGSEA). In addition, a hub gene-related network was constructed and potential drugs for the treatment of OP were predicted. Finally, the specific genes were verified by real-time quantitative polymerase chain reaction (RT-qPCR). Results: In total, 548 DEGs were identified in the GSE56815 dataset. The weighted gene co-expression network analysis(WGCNA) identified 2291 key module genes, and 91 DE-MRG were obtained by combining the two. The PPI network revealed that the target gene for AKT1 interacted with most proteins. Three MRG (NELFB, SFSWAP, and MAP3K3) were identified as hub genes, with areas under the curve (AUC) 0.75, 0.71, and 0.70, respectively. The nomogram model has high diagnostic value. GO and KEGG analysis showed that ribosome pathway and cellular ribosome pathway may be the pathways regulating the progression of OP. IPA showed that MAP3K3 was associated with six pathways, including GNRH Signaling. The ssGSEA indicated that NELFB was highly correlated with iDCs (cor = -0.390, p < 0.001). The regulatory network showed a complex relationship between miRNA, transcription factor(TF) and hub genes. In addition, 4 drugs such as vinclozolin were predicted to be potential therapeutic drugs for OP. In RT-qPCR verification, the hub gene NELFB was consistent with the results of bioinformatics analysis. Conclusion: Mitophagy plays an important role in the development of osteoporosis. The identification of three mitophagy-related genes may contribute to the early diagnosis, mechanism research and treatment of OP.
Collapse
Affiliation(s)
- Yu Su
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Gangying Yu
- Department of International Ward (Orthopedic), Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Dongchen Li
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Yao Lu
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Cheng Ren
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Yibo Xu
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Yanling Yang
- Basic Medical College of Yan’an University, Yan’an, China
| | - Kun Zhang
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Teng Ma
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| | - Zhong Li
- Honghui Hospital, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
2
|
Zhao Z, Wang Q, Zhao F, Ma J, Sui X, Choe HC, Chen P, Gao X, Zhang L. Single-cell and transcriptomic analyses reveal the influence of diabetes on ovarian cancer. BMC Genomics 2024; 25:1. [PMID: 38166541 PMCID: PMC10759538 DOI: 10.1186/s12864-023-09893-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
BACKGROUND There has been a significant surge in the global prevalence of diabetes mellitus (DM), which increases the susceptibility of individuals to ovarian cancer (OC). However, the relationship between DM and OC remains largely unexplored. The objective of this study is to provide preliminary insights into the shared molecular regulatory mechanisms and potential biomarkers between DM and OC. METHODS Multiple datasets from the GEO database were utilized for bioinformatics analysis. Single cell datasets from the GEO database were analysed. Subsequently, immune cell infiltration analysis was performed on mRNA expression data. The intersection of these datasets yielded a set of common genes associated with both OC and DM. Using these overlapping genes and Cytoscape, a protein‒protein interaction (PPI) network was constructed, and 10 core targets were selected. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were then conducted on these core targets. Additionally, advanced bioinformatics analyses were conducted to construct a TF-mRNA-miRNA coregulatory network based on identified core targets. Furthermore, immunohistochemistry staining (IHC) and real-time quantitative PCR (RT-qPCR) were employed for the validation of the expression and biological functions of core proteins, including HSPAA1, HSPA8, SOD1, and transcription factors SREBF2 and GTAT2, in ovarian tumors. RESULTS The immune cell infiltration analysis based on mRNA expression data for both DM and OC, as well as analysis using single-cell datasets, reveals significant differences in mononuclear cell levels. By intersecting the single-cell datasets, a total of 119 targets related to mononuclear cells in both OC and DM were identified. PPI network analysis further identified 10 hub genesincludingHSP90AA1, HSPA8, SNRPD2, UBA52, SOD1, RPL13A, RPSA, ITGAM, PPP1CC, and PSMA5, as potential targets of OC and DM. Enrichment analysis indicated that these genes are primarily associated with neutrophil degranulation, GDP-dissociation inhibitor activity, and the IL-17 signaling pathway, suggesting their involvement in the regulation of the tumor microenvironment. Furthermore, the TF-gene and miRNA-gene regulatory networks were validated using NetworkAnalyst. The identified TFs included SREBF2, GATA2, and SRF, while the miRNAs included miR-320a, miR-378a-3p, and miR-26a-5p. Simultaneously, IHC and RT-qPCR reveal differential expression of core targets in ovarian tumors after the onset of diabetes. RT-qPCR further revealed that SREBF2 and GATA2 may influence the expression of core proteins, including HSP90AA1, HSPA8, and SOD1. CONCLUSION This study revealed the shared gene interaction network between OC and DM and predicted the TFs and miRNAs associated with core genes in monocytes. Our research findings contribute to identifying potential biological mechanisms underlying the relationship between OC and DM.
Collapse
Affiliation(s)
- Zhihao Zhao
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China
| | - Qilin Wang
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China
| | - Fang Zhao
- Institute of Innovation and Applied Research in Chinese Medicine, Department of Rheumatology of The First Hospital, Hunan University of Chinese Medicine, Changsha, Hunan, China
| | - Junnan Ma
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China
| | - Xue Sui
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China
| | - Hyok Chol Choe
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China
- Department of Clinical Medicine, Sinuiju Medical University, Sinuiju, Democratic People's Republic of Korea
| | - Peng Chen
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China
| | - Xue Gao
- Department of Pathology, the First Hospital of Dalian Medical University, Dalian, Liaoning Province, 116027, China.
| | - Lin Zhang
- Institute (College) of Integrative Medicine, Dalian Medical University, Dalian, China.
| |
Collapse
|
3
|
Jiao P, Wang B, Wang X, Liu B, Wang Y, Li J. Struct2GO: protein function prediction based on graph pooling algorithm and AlphaFold2 structure information. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:btad637. [PMID: 37847755 PMCID: PMC10612405 DOI: 10.1093/bioinformatics/btad637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 10/05/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023]
Abstract
MOTIVATION In recent years, there has been a breakthrough in protein structure prediction, and the AlphaFold2 model of the DeepMind team has improved the accuracy of protein structure prediction to the atomic level. Currently, deep learning-based protein function prediction models usually extract features from protein sequences and combine them with protein-protein interaction networks to achieve good results. However, for newly sequenced proteins that are not in the protein-protein interaction network, such models cannot make effective predictions. To address this, this article proposes the Struct2GO model, which combines protein structure and sequence data to enhance the precision of protein function prediction and the generality of the model. RESULTS We obtain amino acid residue embeddings in protein structure through graph representation learning, utilize the graph pooling algorithm based on a self-attention mechanism to obtain the whole graph structure features, and fuse them with sequence features obtained from the protein language model. The results demonstrate that compared with the traditional protein sequence-based function prediction model, the Struct2GO model achieves better results. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available at https://github.com/lyjps/Struct2GO.
Collapse
Affiliation(s)
- Peishun Jiao
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guang Dong 518055, China
| | - Beibei Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guang Dong 518055, China
| | - Xuan Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guang Dong 518055, China
- Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Bo Liu
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guang Dong 518055, China
- Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
- Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
4
|
Dong Q, Han Z, Tian L. Identification of Serum Exosome-Derived circRNA-miRNA-TF-mRNA Regulatory Network in Postmenopausal Osteoporosis Using Bioinformatics Analysis and Validation in Peripheral Blood-Derived Mononuclear Cells. Front Endocrinol (Lausanne) 2022; 13:899503. [PMID: 35757392 PMCID: PMC9218277 DOI: 10.3389/fendo.2022.899503] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Osteoporosis is one of the most common systemic metabolic bone diseases, especially in postmenopausal women. Circular RNA (circRNA) has been implicated in various human diseases. However, the potential role of circRNAs in postmenopausal osteoporosis (PMOP) remains largely unknown. The study aims to identify potential biomarkers and further understand the mechanism of PMOP by constructing a circRNA-associated ceRNA network. METHODS The PMOP-related datasets GSE161361, GSE64433, and GSE56116 were downloaded from the Gene Expression Omnibus (GEO) database and were used to obtain differentially expressed genes (DEGs). Gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were applied to determine possible relevant functions of differentially expressed messenger RNAs (mRNAs). The TRRUST database was used to predict differential transcription factor (TF)-mRNA regulatory pairs. Afterwards, combined CircBank and miRTarBase, circRNA-miRNA as well as miRNA-TF pairs were constructed. Then, a circRNA-miRNA-TF-mRNA network was established. Next, the correlation of mRNAs, TFs, and PMOP was verified by the Comparative Toxicogenomics Database. And expression levels of key genes, including circRNAs, miRNAs, TFs, and mRNAs in the ceRNA network were further validated by quantitative real-time PCR (qRT-PCR). Furthermore, to screen out signaling pathways related to key mRNAs of the ceRNA network, Gene Set Enrichment Analysis (GSEA) was performed. RESULTS A total of 1201 DE mRNAs, 44 DE miRNAs, and 1613 DE circRNAs associated with PMOP were obtained. GO function annotation showed DE mRNAs were mainly related to inflammatory responses. KEGG analysis revealed DE mRNAs were mainly enriched in osteoclast differentiation, rheumatoid arthritis, hematopoietic cell lineage, and cytokine-cytokine receptor interaction pathways. We first identified 26 TFs and their target mRNAs. Combining DE miRNAs, miRNA-TF/mRNA pairs were obtained. Combining DE circRNAs, we constructed the ceRNA network contained 6 circRNAs, 4 miRNAs, 4 TFs, and 12 mRNAs. The expression levels of most genes detected by qRT-PCR were generally consistent with the microarray results. Combined with the qRT-PCR validation results, we eventually identified the ceRNA network that contained 4 circRNAs, 3 miRNAs, 3 TFs, and 9 mRNAs. The GSEA revealed that 9 mRNAs participate in many important signaling pathways, such as "olfactory transduction", "T cell receptor signaling pathway", and "neuroactive ligand-receptor interaction". These pathways have been reported to the occurrence and development of PMOP. To sum up, key mRNAs in the ceRNA network may participate in the development of osteoporosis by regulating related signal pathways. CONCLUSIONS A circRNA-associated ceRNA network containing TFs was established for PMOP. The study may help further explore the molecular mechanisms and may serve as potential biomarkers or therapeutic targets for PMOP.
Collapse
Affiliation(s)
- Qianqian Dong
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, China
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, China
- Clinical Research Center for Metabolic Disease, Gansu Provincial Hospital, Lanzhou, China
| | - Ziqi Han
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, China
- Clinical Research Center for Metabolic Disease, Gansu Provincial Hospital, Lanzhou, China
| | - Limin Tian
- The First School of Clinical Medicine, Lanzhou University, Lanzhou, China
- Department of Endocrinology, Gansu Provincial Hospital, Lanzhou, China
- Clinical Research Center for Metabolic Disease, Gansu Provincial Hospital, Lanzhou, China
- *Correspondence: Limin Tian,
| |
Collapse
|
5
|
Sharma V, Monti P, Fronza G, Inga A. Human transcription factors in yeast: the fruitful examples of P53 and NF-кB. FEMS Yeast Res 2016; 16:fow083. [PMID: 27683095 DOI: 10.1093/femsyr/fow083] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/24/2016] [Indexed: 12/31/2022] Open
Abstract
The observation that human transcription factors (TFs) can function when expressed in yeast cells has stimulated the development of various functional assays to investigate (i) the role of binding site sequences (herein referred to as response elements, REs) in transactivation specificity, (ii) the impact of polymorphic nucleotide variants on transactivation potential, (iii) the functional consequences of mutations in TFs and (iv) the impact of cofactors or small molecules. These approaches have found applications in basic as well as applied research, including the identification and the characterisation of mutant TF alleles from clinical samples. The ease of genome editing of yeast cells and the availability of regulated systems for ectopic protein expression enabled the development of quantitative reporter systems, integrated at a chosen chromosomal locus in isogenic yeast strains that differ only at the level of a specific RE targeted by a TF or for the expression of distinct TF alleles. In many cases, these assays were proven predictive of results in higher eukaryotes. The potential to work in small volume formats and the availability of yeast strains with modified chemical uptake have enhanced the scalability of these approaches. Next to well-established one-, two-, three-hybrid assays, the functional assays with non-chimeric human TFs enrich the palette of opportunities for functional characterisation. We review ∼25 years of research on human sequence-specific TFs expressed in yeast, with an emphasis on the P53 and NF-кB family of proteins, highlighting outcomes, advantages, challenges and limitations of these heterologous assays.
Collapse
Affiliation(s)
- Vasundhara Sharma
- Centre for Integrative Biology, CIBIO, University of Trento, via Sommarive 9, 38123, Trento, Italy
| | - Paola Monti
- U.O.C. Mutagenesi, IRCCS AOU San Martino-IST, Largo R. Benzi, 10, 16132, Genova, Italy
| | - Gilberto Fronza
- U.O.C. Mutagenesi, IRCCS AOU San Martino-IST, Largo R. Benzi, 10, 16132, Genova, Italy
| | - Alberto Inga
- Centre for Integrative Biology, CIBIO, University of Trento, via Sommarive 9, 38123, Trento, Italy
| |
Collapse
|
6
|
FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces. Methods Mol Biol 2016; 1482:259-77. [PMID: 27557773 DOI: 10.1007/978-1-4939-6396-6_17] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
FootprintDB is a database and search engine that compiles regulatory sequences from open access libraries of curated DNA cis-elements and motifs, and their associated transcription factors (TFs). It systematically annotates the binding interfaces of the TFs by exploiting protein-DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interface residues involved in specific recognition. This chapter explains step-by-step how to search for DNA motifs and protein sequences in footprintDB and how to focus the search to a particular organism. Two real-world examples are shown where this software was used to analyze transcriptional regulation in plants. Results are described with the aim of guiding users on their interpretation, and special attention is given to the choices users might face when performing similar analyses.
Collapse
|
7
|
Analysis of the DNA-Binding Activities of the Arabidopsis R2R3-MYB Transcription Factor Family by One-Hybrid Experiments in Yeast. PLoS One 2015; 10:e0141044. [PMID: 26484765 PMCID: PMC4613820 DOI: 10.1371/journal.pone.0141044] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/02/2015] [Indexed: 12/20/2022] Open
Abstract
The control of growth and development of all living organisms is a complex and dynamic process that requires the harmonious expression of numerous genes. Gene expression is mainly controlled by the activity of sequence-specific DNA binding proteins called transcription factors (TFs). Amongst the various classes of eukaryotic TFs, the MYB superfamily is one of the largest and most diverse, and it has considerably expanded in the plant kingdom. R2R3-MYBs have been extensively studied over the last 15 years. However, DNA-binding specificity has been characterized for only a small subset of these proteins. Therefore, one of the remaining challenges is the exhaustive characterization of the DNA-binding specificity of all R2R3-MYB proteins. In this study, we have developed a library of Arabidopsis thaliana R2R3-MYB open reading frames, whose DNA-binding activities were assayed in vivo (yeast one-hybrid experiments) with a pool of selected cis-regulatory elements. Altogether 1904 interactions were assayed leading to the discovery of specific patterns of interactions between the various R2R3-MYB subgroups and their DNA target sequences and to the identification of key features that govern these interactions. The present work provides a comprehensive in vivo analysis of R2R3-MYB binding activities that should help in predicting new DNA motifs and identifying new putative target genes for each member of this very large family of TFs. In a broader perspective, the generated data will help to better understand how TF interact with their target DNA sequences.
Collapse
|
8
|
Dubos C, Kelemen Z, Sebastian A, Bülow L, Huep G, Xu W, Grain D, Salsac F, Brousse C, Lepiniec L, Weisshaar B, Contreras-Moreira B, Hehl R. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes. BMC Genomics 2014; 15:317. [PMID: 24773781 PMCID: PMC4234446 DOI: 10.1186/1471-2164-15-317] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 04/16/2014] [Indexed: 11/22/2022] Open
Abstract
Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Reinhard Hehl
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr, 7, 38106 Braunschweig, Germany.
| |
Collapse
|
9
|
Sebastian A, Contreras-Moreira B. footprintDB: a database of transcription factors with annotated cis elements and binding interfaces. ACTA ACUST UNITED AC 2013; 30:258-65. [PMID: 24234003 DOI: 10.1093/bioinformatics/btt663] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. RESULTS FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. AVAILABILITY AND IMPLEMENTATION Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
Collapse
Affiliation(s)
- Alvaro Sebastian
- Laboratory of Computational Biology, Department of Genetics and Plant Production, Estación Experimental de Aula Dei/CSIC, Av. Montañana 1005, Zaragoza (http://www.eead.csic.es/compbio) and Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain
| | | |
Collapse
|