1
|
Singh S, Kiran M, Somvanshi PR. Computational Inference of Gene Regulatory Network Using Genome-wide ChIP-X Data. Methods Mol Biol 2024; 2719:295-306. [PMID: 37803124 DOI: 10.1007/978-1-0716-3461-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Gene regulatory network is the architecture of transcription factors (TFs) and their gene targets, which help in controlling their expression as required by a phenotype during various environmental perturbations. Inferring the regulatory network from the high-throughput data needs an algorithmic approach involving statistical analysis. There are several interaction databases such as JASPAR and SwissRegulon that provide information for TFs-targets pair interaction, which are estimated based on experimental and prediction procedures. These repositories are majorly used for predicting the complex structure of GRNs either with or without gene expression data. Here we described and discussed the step-wise procedures to extract the interaction data for a desired set of target-TFs from the JASPAR database, and used that information to infer the network by using the igraph library. Further, we also mentioned the important parameters for analyzing the different properties of the network. The described procedure will be helpful in discerning the GRN based on the set of TF-gene pairs.
Collapse
Affiliation(s)
- Samayaditya Singh
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Manjari Kiran
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Pramod R Somvanshi
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| |
Collapse
|
2
|
Yi HC, You ZH, Wang MN, Guo ZH, Wang YB, Zhou JR. RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information. BMC Bioinformatics 2020; 21:60. [PMID: 32070279 PMCID: PMC7029608 DOI: 10.1186/s12859-020-3406-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 02/11/2020] [Indexed: 01/03/2023] Open
Abstract
Background The interactions between non-coding RNAs (ncRNA) and proteins play an essential role in many biological processes. Several high-throughput experimental methods have been applied to detect ncRNA-protein interactions. However, these methods are time-consuming and expensive. Accurate and efficient computational methods can assist and accelerate the study of ncRNA-protein interactions. Results In this work, we develop a stacking ensemble computational framework, RPI-SE, for effectively predicting ncRNA-protein interactions. More specifically, to fully exploit protein and RNA sequence feature, Position Weight Matrix combined with Legendre Moments is applied to obtain protein evolutionary information. Meanwhile, k-mer sparse matrix is employed to extract efficient feature of ncRNA sequences. Finally, an ensemble learning framework integrated different types of base classifier is developed to predict ncRNA-protein interactions using these discriminative features. The accuracy and robustness of RPI-SE was evaluated on three benchmark data sets under five-fold cross-validation and compared with other state-of-the-art methods. Conclusions The results demonstrate that RPI-SE is competent for ncRNA-protein interactions prediction task with high accuracy and robustness. It’s anticipated that this work can provide a computational prediction tool to advance ncRNA-protein interactions related biomedical research.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Mei-Neng Wang
- School of Mathematics and Computer Science, Yichun University, Yichun, 336000, China
| | - Zhen-Hao Guo
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Yan-Bin Wang
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Ji-Ren Zhou
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
3
|
Fedonin GG, Eroshkin A, Cieplak P, Matveev EV, Ponomarev GV, Gelfand MS, Ratnikov BI, Kazanov MD. Predictive models of protease specificity based on quantitative protease-activity profiling data. Biochim Biophys Acta Proteins Proteom 2019; 1867:140253. [PMID: 31330204 DOI: 10.1016/j.bbapap.2019.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 07/09/2019] [Accepted: 07/17/2019] [Indexed: 10/26/2022]
Abstract
Bioinformatics-based prediction of protease substrates can help to elucidate regulatory proteolytic pathways that control a broad range of biological processes such as apoptosis and blood coagulation. The majority of published predictive models are position weight matrices (PWM) reflecting specificity of proteases toward target sequence. These models are typically derived from experimental data on positions of hydrolyzed peptide bonds and show a reasonable predictive power. New emerging techniques that not only register the cleavage position but also measure catalytic efficiency of proteolysis are expected to improve the quality of predictions or at least substantially reduce the number of tested substrates required for confident predictions. The main goal of this study was to develop new prediction models based on such data and to estimate the performance of the constructed models. We used data on catalytic efficiency of proteolysis measured for eight major human matrix metalloproteinases to construct predictive models of protease specificity using a variety of regression analysis techniques. The obtained results suggest that efficiency-based (quantitative) models show a comparable performance with conventional PWM-based algorithms, while less training data are required. The derived list of candidate cleavage sites in human secreted proteins may serve as a starting point for experimental analysis.
Collapse
Affiliation(s)
- Gennady G Fedonin
- Central Research Institute of Epidemiology, Moscow 111123, Russia; A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Moscow Institute of Physics and Technology, Dolgoprudny 141700, Russia
| | - Alexey Eroshkin
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Piotr Cieplak
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | | | - Gennady V Ponomarev
- A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia
| | - Mikhail S Gelfand
- A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Skolkovo Institute of Science and Technology, Moscow 121205, Russia; National Research University Higher School of Economics, Moscow 101000, Russia
| | - Boris I Ratnikov
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Marat D Kazanov
- A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Skolkovo Institute of Science and Technology, Moscow 121205, Russia; Dmitry Rogachev National Medical Research Center of Pediatric Hematology, Oncology and Immunology, Moscow 117997, Russia.
| |
Collapse
|
4
|
Ruan S, Stormo GD. Comparison of discriminative motif optimization using matrix and DNA shape-based models. BMC Bioinformatics 2018; 19:86. [PMID: 29510689 PMCID: PMC5840810 DOI: 10.1186/s12859-018-2104-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 03/01/2018] [Indexed: 12/12/2022] Open
Abstract
Background Transcription factor (TF) binding site specificity is commonly represented by some form of matrix model in which the positions in the binding site are assumed to contribute independently to the site’s activity. The independence assumption is known to be an approximation, often a good one but sometimes poor. Alternative approaches have been developed that use k-mers (DNA “words” of length k) to account for the non-independence, and more recently DNA structural parameters have been incorporated into the models. ChIP-seq data are often used to assess the discriminatory power of motifs and to compare different models. However, to measure the improvement due to using more complex models, one must compare to optimized matrix models. Results We describe a program “Discriminative Additive Model Optimization” (DAMO) that uses positive and negative examples, as in ChIP-seq data, and finds the additive position weight matrix (PWM) that maximizes the Area Under the Receiver Operating Characteristic Curve (AUROC). We compare to a recent study where structural parameters, serving as features in a gradient boosting classifier algorithm, are shown to improve the AUROC over JASPAR position frequency matrices (PFMs). In agreement with the previous results, we find that adding structural parameters gives the largest improvement, but most of the gain can be obtained by an optimized PWM and nearly all of the gain can be obtained with a di-nucleotide extension to the PWM. Conclusion To appropriately compare different models for TF bind sites, optimized models must be used. PWMs and their extensions are good representations of binding specificity for most TFs, and more complex models, including the incorporation of DNA shape features and gradient boosting classifiers, provide only moderate improvements for a few TFs. Electronic supplementary material The online version of this article (10.1186/s12859-018-2104-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shuxiang Ruan
- Department of Genetics and Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA
| | - Gary D Stormo
- Department of Genetics and Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA.
| |
Collapse
|
5
|
Abstract
Background position weight matrix (PWM) and sequence logo are the most widely used representations of transcription factor binding site (TFBS) in biological sequences. Sequence logo - a graphical representation of PWM, has been widely used in scientific publications and reports, due to its easiness of human perception, rich information, and simple format. Different from sequence logo, PWM works great as a precise and compact digitalized form, which can be easily used by a variety of motif analysis software. There are a few available tools to generate sequence logos from PWM; however, no tool does the reverse. Such tool to convert sequence logo back to PWM is needed to scan a TFBS represented in logo format in a publication where the PWM is not provided or hard to be acquired. A major difficulty in developing such tool to convert sequence logo to PWM is to deal with the diversity of sequence logo images. Results We propose logo2PWM for reconstructing PWM from a large variety of sequence logo images. Evaluation results on over one thousand logos from three sources of different logo format show that the correlation between the reconstructed PWMs and the original PWMs are constantly high, where median correlation is greater than 0.97. Conclusion Because of the high recognition accuracy, the easiness of usage, and, the availability of both web-based service and stand-alone application, we believe that logo2PWM can readily benefit the study of transcription by filling the gap between sequence logo and PWM.
Collapse
Affiliation(s)
- Zhen Gao
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA.
| | - Lu Liu
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA
| | - Jianhua Ruan
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, 78249, TX, USA
| |
Collapse
|
6
|
Wu S, Han J, Zhang X, Zhong D, Liu R. A computational model for predicting integrase catalytic domain of retrovirus. J Theor Biol 2017; 423:63-70. [PMID: 28454901 DOI: 10.1016/j.jtbi.2017.04.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 04/01/2017] [Accepted: 04/21/2017] [Indexed: 11/23/2022]
Abstract
Integrase catalytic domain (ICD) is an essential part in the retrovirus for integration reaction, which enables its newly synthesized DNA to be incorporated into the DNA of infected cells. Owing to the crucial role of ICD for the retroviral replication and the absence of an equivalent of integrase in host cells, it is comprehensible that ICD is a promising drug target for therapeutic intervention. However, annotated ICDs in UniProtKB database have still been insufficient for a good understanding of their statistical characteristics so far. Accordingly, it is of great importance to put forward a computational ICD model in this work to annotate these domains in the retroviruses. The proposed model then discovered 11,660 new putative ICDs after scanning sequences without ICD annotations. Subsequently in order to provide much confidence in ICD prediction, it was tested under different cross-validation methods, compared with other database search tools, and verified on independent datasets. Furthermore, an evolutionary analysis performed on the annotated ICDs of retroviruses revealed a tight connection between ICD and retroviral classification. All the datasets involved in this paper and the application software tool of this model can be available for free download at https://sourceforge.net/projects/icdtool/files/?source=navbar.
Collapse
|
7
|
Swindell WR, Sarkar MK, Stuart PE, Voorhees JJ, Elder JT, Johnston A, Gudjonsson JE. Psoriasis drug development and GWAS interpretation through in silico analysis of transcription factor binding sites. Clin Transl Med 2015; 4:13. [PMID: 25883770 PMCID: PMC4392043 DOI: 10.1186/s40169-015-0054-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/26/2015] [Indexed: 12/22/2022] Open
Abstract
Background Psoriasis is a cytokine-mediated skin disease that can be treated effectively with immunosuppressive biologic agents. These medications, however, are not equally effective in all patients and are poorly suited for treating mild psoriasis. To develop more targeted therapies, interfering with transcription factor (TF) activity is a promising strategy. Methods Meta-analysis was used to identify differentially expressed genes (DEGs) in the lesional skin from psoriasis patients (n = 237). We compiled a dictionary of 2935 binding sites representing empirically-determined binding affinities of TFs and unconventional DNA-binding proteins (uDBPs). This dictionary was screened to identify “psoriasis response elements” (PREs) overrepresented in sequences upstream of psoriasis DEGs. Results PREs are recognized by IRF1, ISGF3, NF-kappaB and multiple TFs with helix-turn-helix (homeo) or other all-alpha-helical (high-mobility group) DNA-binding domains. We identified a limited set of DEGs that encode proteins interacting with PRE motifs, including TFs (GATA3, EHF, FOXM1, SOX5) and uDBPs (AVEN, RBM8A, GPAM, WISP2). PREs were prominent within enhancer regions near cytokine-encoding DEGs (IL17A, IL19 and IL1B), suggesting that PREs might be incorporated into complex decoy oligonucleotides (cdODNs). To illustrate this idea, we designed a cdODN to concomitantly target psoriasis-activated TFs (i.e., FOXM1, ISGF3, IRF1 and NF-kappaB). Finally, we screened psoriasis-associated SNPs to identify risk alleles that disrupt or engender PRE motifs. This identified possible sites of allele-specific TF/uDBP binding and showed that PREs are disproportionately disrupted by psoriasis risk alleles. Conclusions We identified new TF/uDBP candidates and developed an approach that (i) connects transcriptome informatics to cdODN drug development and (ii) enhances our ability to interpret GWAS findings. Disruption of PRE motifs by psoriasis risk alleles may contribute to disease susceptibility. Electronic supplementary material The online version of this article (doi:10.1186/s40169-015-0054-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- William R Swindell
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| | - Mrinal K Sarkar
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| | - Philip E Stuart
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| | - John J Voorhees
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| | - James T Elder
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| | - Andrew Johnston
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| | - Johann E Gudjonsson
- Department of Dermatology, University of Michigan School of Medicine, Ann Arbor, MI 48109-2200 USA
| |
Collapse
|
8
|
Rhee JK, Shin SY, Zhang BT. Construction of microRNA functional families by a mixture model of position weight matrices. PeerJ 2013; 1:e199. [PMID: 24255813 PMCID: PMC3817585 DOI: 10.7717/peerj.199] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 10/10/2013] [Indexed: 12/23/2022] Open
Abstract
MicroRNAs (miRNAs) are small regulatory molecules that repress the translational processes of their target genes by binding to their 3′ untranslated regions (3′ UTRs). Because the target genes are predominantly determined by their sequence complementarity to the miRNA seed regions (nucleotides 2–7) which are evolutionarily conserved, it is inferred that the target relationships and functions of the miRNA family members are conserved across many species. Therefore, detecting the relevant miRNA families with confidence would help to clarify the conserved miRNA functions, and elucidate miRNA-mediated biological processes. We present a mixture model of position weight matrices for constructing miRNA functional families. This model systematically finds not only evolutionarily conserved miRNA family members but also functionally related miRNAs, as it simultaneously generates position weight matrices representing the conserved sequences. Using mammalian miRNA sequences, in our experiments, we identified potential miRNA groups characterized by similar sequence patterns that have common functions. We validated our results using score measures and by the analysis of the conserved targets. Our method would provide a way to comprehensively identify conserved miRNA functions.
Collapse
Affiliation(s)
- Je-Keun Rhee
- Interdisciplinary Program in Bioinformatics, Seoul National University , Seoul , Korea
| | | | | |
Collapse
|