1
|
Ding K, Dixit G, Parker BJ, Wen J. CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets. Front Big Data 2023; 6:1113402. [PMID: 36999047 PMCID: PMC10043243 DOI: 10.3389/fdata.2023.1113402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/23/2023] [Indexed: 03/17/2023] Open
Abstract
Recent large datasets measuring the gene expression of millions of possible gene promoter sequences provide a resource to design and train optimized deep neural network architectures to predict expression from sequences. High predictive performance due to the modeling of dependencies within and between regulatory sequences is an enabler for biological discoveries in gene regulation through model interpretation techniques. To understand the regulatory code that delineates gene expression, we have designed a novel deep-learning model (CRMnet) to predict gene expression in Saccharomyces cerevisiae. Our model outperforms the current benchmark models and achieves a Pearson correlation coefficient of 0.971 and a mean squared error of 3.200. Interpretation of informative genomic regions determined from model saliency maps, and overlapping the saliency maps with known yeast motifs, supports that our model can successfully locate the binding sites of transcription factors that actively modulate gene expression. We compare our model's training times on a large compute cluster with GPUs and Google TPUs to indicate practical training times on similar datasets.
Collapse
Affiliation(s)
- Ke Ding
- Division of Genome Science and Cancer, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Gunjan Dixit
- Division of Genome Science and Cancer, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
| | - Brian J. Parker
- School of Computing and Biological Data Science Institute, Australian National University, Canberra, ACT, Australia
- *Correspondence: Brian J. Parker
| | - Jiayu Wen
- Division of Genome Science and Cancer, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- Jiayu Wen
| |
Collapse
|
2
|
|
3
|
Kaufmann B, Willinger O, Kikuchi N, Navon N, Kermas L, Goldberg S, Amit R. An Oligo-Library-Based Approach for Mapping DNA-DNA Triplex Interactions In Vitro. ACS Synth Biol 2021; 10:1808-1820. [PMID: 34374529 DOI: 10.1021/acssynbio.1c00122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
We present Triplex-seq, a deep-sequencing method that systematically maps the interaction space between an oligo library of ssDNA triplex-forming oligos (TFOs) and a particular dsDNA triplex target site (TTS). We demonstrate the method using a randomized oligo library comprising 67 million variants, with five TTSs that differ in guanine (G) content, at two different buffer conditions, denoted pH 5 and pH 7. Our results show that G-rich triplexes form at both pH 5 and pH 7, with the pH 5 set being more stable, indicating that there is a subset of TFOs that form triplexes only at pH 5. In addition, using information analysis, we identify triplex-forming motifs (TFMs), which correspond to minimal functional TFO sequences. We demonstrate, in single-variant verification experiments, that TFOs with these TFMs indeed form a triplex with G-rich TTSs, and that a single mutation in the TFM motif can alleviate binding. Our results show that deep-sequencing platforms can substantially expand our understanding of triplex binding rules and aid in refining the DNA triplex code.
Collapse
Affiliation(s)
- Beate Kaufmann
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Or Willinger
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Nanami Kikuchi
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Noa Navon
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Lisa Kermas
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Sarah Goldberg
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
- Russell Berrie Nanotechnology Institute, Technion - Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
4
|
Nielsen MM, Pedersen JS. miRNA activity inferred from single cell mRNA expression. Sci Rep 2021; 11:9170. [PMID: 33911110 PMCID: PMC8080788 DOI: 10.1038/s41598-021-88480-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 04/08/2021] [Indexed: 01/26/2023] Open
Abstract
High throughput single-cell RNA sequencing (scRNAseq) can provide mRNA expression profiles for thousands of cells. However, miRNAs cannot currently be studied at the same scale. By exploiting that miRNAs bind well-defined sequence motifs and typically down-regulate target genes, we show that motif enrichment analysis can be used to derive miRNA activity estimates from scRNAseq data. Motif enrichment analyses have traditionally been used to derive binding motifs for regulatory factors, such as miRNAs or transcription factors, that have an effect on gene expression. Here we reverse its use. By starting from the miRNA seed site, we derive a measure of activity for miRNAs in single cells. We first establish the approach on a comprehensive set of bulk TCGA cancer samples (n = 9679), with paired mRNA and miRNA expression profiles, where many miRNAs show a strong correlation with measured expression. By downsampling we show that the method can be used to estimate miRNA activity in sparse data comparable to scRNAseq experiments. We then analyze a human and a mouse scRNAseq data set, and show that for several miRNA candidates, including liver specific miR-122 and muscle specific miR-1 and miR-133a, we obtain activity measures supported by the literature. The methods are implemented and made available in the miReact software. Our results demonstrate that miRNA activities can be estimated at the single cell level. This allows insights into the dynamics of miRNA activity across a range of fields where scRNAseq is applied.
Collapse
Affiliation(s)
- Morten Muhlig Nielsen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark.,Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark
| | - Jakob Skou Pedersen
- Department of Molecular Medicine (MOMA), Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, 8200, Aarhus N, Denmark. .,Department of Clinical Medicine, Aarhus University, Palle Juul-Jensens Boulevard 82, 8200, Aarhus N, Denmark. .,Bioinformatics Research Centre, C.F. Møllers Allé 8, Aarhus University, 8000, Aarhus C, Denmark.
| |
Collapse
|
5
|
Delos Santos NP, Texari L, Benner C. MEIRLOP: improving score-based motif enrichment by incorporating sequence bias covariates. BMC Bioinformatics 2020; 21:410. [PMID: 32938397 PMCID: PMC7493370 DOI: 10.1186/s12859-020-03739-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 09/04/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs. RESULTS We developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs. CONCLUSIONS Our results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.
Collapse
Affiliation(s)
- Nathaniel P Delos Santos
- Department of Biomedical Informatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA
| | - Lorane Texari
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA
| | - Christopher Benner
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA.
| |
Collapse
|
6
|
RNA-centric approaches to study RNA-protein interactions in vitro and in silico. Methods 2020; 178:11-18. [DOI: 10.1016/j.ymeth.2019.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 09/10/2019] [Accepted: 09/10/2019] [Indexed: 01/17/2023] Open
|
7
|
Carazo F, Romero JP, Rubio A. Upstream analysis of alternative splicing: a review of computational approaches to predict context-dependent splicing factors. Brief Bioinform 2020; 20:1358-1375. [PMID: 29390045 DOI: 10.1093/bib/bby005] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 12/14/2017] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) has shown to play a pivotal role in the development of diseases, including cancer. Specifically, all the hallmarks of cancer (angiogenesis, cell immortality, avoiding immune system response, etc.) are found to have a counterpart in aberrant splicing of key genes. Identifying the context-specific regulators of splicing provides valuable information to find new biomarkers, as well as to define alternative therapeutic strategies. The computational models to identify these regulators are not trivial and require three conceptual steps: the detection of AS events, the identification of splicing factors that potentially regulate these events and the contextualization of these pieces of information for a specific experiment. In this work, we review the different algorithmic methodologies developed for each of these tasks. Main weaknesses and strengths of the different steps of the pipeline are discussed. Finally, a case study is detailed to help the reader be aware of the potential and limitations of this computational approach.
Collapse
|
8
|
Hashim FA, Houssein EH, Hussain K, Mabrouk MS, Al-Atabany W. A modified Henry gas solubility optimization for solving motif discovery problem. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04611-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
Martínez JC, Randolph LK, Iascone DM, Pernice HF, Polleux F, Hengst U. Pum2 Shapes the Transcriptome in Developing Axons through Retention of Target mRNAs in the Cell Body. Neuron 2019; 104:931-946.e5. [PMID: 31606248 DOI: 10.1016/j.neuron.2019.08.035] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Revised: 05/31/2019] [Accepted: 08/21/2019] [Indexed: 02/07/2023]
Abstract
Localized protein synthesis is fundamental for neuronal development, maintenance, and function. Transcriptomes in axons and soma are distinct, but the mechanisms governing the composition of axonal transcriptomes and their developmental regulation are only partially understood. We found that the binding motif for the RNA-binding proteins Pumilio 1 and 2 (Pum1 and Pum2) is underrepresented in transcriptomes of developing axons. Introduction of Pumilio-binding elements (PBEs) into mRNAs containing a β-actin zipcode prevented axonal localization and translation. Pum2 is restricted to the soma of developing neurons, and Pum2 knockdown or blocking its binding to mRNA caused the appearance and translation of PBE-containing mRNAs in axons. Pum2-deficient neurons exhibited axonal growth and branching defects in vivo and impaired axon regeneration in vitro. These results reveal that Pum2 shapes axonal transcriptomes by preventing the transport of PBE-containing mRNAs into axons, and they identify somatic mRNAs retention as a mechanism for the temporal control of intra-axonal protein synthesis.
Collapse
Affiliation(s)
- José C Martínez
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY 10032, USA; The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA
| | - Lisa K Randolph
- The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA; Doctoral Program in Neurobiology and Behavior, Columbia University, New York, NY 10027, USA
| | - Daniel Maxim Iascone
- Doctoral Program in Neurobiology and Behavior, Columbia University, New York, NY 10027, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Helena F Pernice
- The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA; Department of Anatomy and Cell Biology, Biomedical Center, Medical Faculty, Ludwig Maximilians University, 82152 Planegg-Martinsried, Germany
| | - Franck Polleux
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Neuroscience, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10027, USA; Kavli Institute for Brain Science, Columbia University, New York, NY 10027, USA
| | - Ulrich Hengst
- The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA; Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
10
|
Polishchuk M, Paz I, Yakhini Z, Mandel-Gutfreund Y. SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data. Nucleic Acids Res 2019; 46:W221-W228. [PMID: 29800452 PMCID: PMC6030986 DOI: 10.1093/nar/gky453] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 05/13/2018] [Indexed: 01/24/2023] Open
Abstract
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Collapse
Affiliation(s)
- Maya Polishchuk
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Vavilov Institute of General Genetics, Russian Academy of Science, 11933 Moscow, Russia
| | - Inbal Paz
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Zohar Yakhini
- School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 46150, Israel.,Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel.,Department of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
11
|
Hashim FA, Mabrouk MS, Al-Atabany W. Review of Different Sequence Motif Finding Algorithms. Avicenna J Med Biotechnol 2019; 11:130-148. [PMID: 31057715 PMCID: PMC6490410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 05/26/2018] [Indexed: 11/05/2022] Open
Abstract
The DNA motif discovery is a primary step in many systems for studying gene function. Motif discovery plays a vital role in identification of Transcription Factor Binding Sites (TFBSs) that help in learning the mechanisms for regulation of gene expression. Over the past decades, different algorithms were used to design fast and accurate motif discovery tools. These algorithms are generally classified into consensus or probabilistic approaches that many of them are time-consuming and easily trapped in a local optimum. Nature-inspired algorithms and many of combinatorial algorithms are recently proposed to overcome these problems. This paper presents a general classification of motif discovery algorithms with new sub-categories that facilitate building a successful motif discovery algorithm. It also presents a summary of comparison between them.
Collapse
Affiliation(s)
- Fatma A. Hashim
- Department of Biomedical Engineering, Helwan University, Egypt
| | - Mai S. Mabrouk
- Department of Biomedical Engineering, Misr University for Science and Technology (MUST), Egypt
| | | |
Collapse
|
12
|
Hashim FA, Mabrouk MS, Atabany WA. Comparative Analysis of DNA Motif Discovery Algorithms: A Systemic Review. CURRENT CANCER THERAPY REVIEWS 2019. [DOI: 10.2174/1573394714666180417161728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Bioinformatics is an interdisciplinary field that combines biology and information
technology to study how to deal with the biological data. The DNA motif discovery
problem is the main challenge of genome biology and its importance is directly proportional to increasing
sequencing technologies which produce large amounts of data. DNA motif is a repeated
portion of DNA sequences of major biological interest with important structural and functional
features. Motif discovery plays a vital role in the antibody-biomarker identification which is useful
for diagnosis of disease and to identify Transcription Factor Binding Sites (TFBSs) that help in
learning the mechanisms for regulation of gene expression. Recently, scientists discovered that the
TFs have a mutation rate five times higher than the flanking sequences, so motif discovery also
has a crucial role in cancer discovery.
Methods:
Over the past decades, many attempts use different algorithms to design fast and accurate
motif discovery tools. These algorithms are generally classified into consensus or probabilistic
approach.
Results:
Many of DNA motif discovery algorithms are time-consuming and easily trapped in a local
optimum.
Conclusion:
Nature-inspired algorithms and many of combinatorial algorithms are recently proposed
to overcome the problems of consensus and probabilistic approaches. This paper presents a
general classification of motif discovery algorithms with new sub-categories. It also presents a
summary comparison between them.
Collapse
Affiliation(s)
- Fatma A. Hashim
- Department of Biomedical Engineering, Helwan University, Helwan, Egypt
| | - Mai S. Mabrouk
- Department of Biomedical Engineering, Misr University for Science and Technology (MUST), Cairo, Egypt
| | | |
Collapse
|
13
|
Arce D, Spetale F, Krsticevic F, Cacchiarelli P, Las Rivas JD, Ponce S, Pratta G, Tapia E. Regulatory motifs found in the small heat shock protein (sHSP) gene family in tomato. BMC Genomics 2018; 19:860. [PMID: 30537925 PMCID: PMC6288846 DOI: 10.1186/s12864-018-5190-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND In living organisms, small heat shock proteins (sHSPs) are triggered in response to stress situations. This family of proteins is large in plants and, in the case of tomato (Solanum lycopersicum), 33 genes have been identified, most of them related to heat stress response and to the ripening process. Transcriptomic and proteomic studies have revealed complex patterns of expression for these genes. In this work, we investigate the coregulation of these genes by performing a computational analysis of their promoter architecture to find regulatory motifs known as heat shock elements (HSEs). We leverage the presence of sHSP members that originated from tandem duplication events and analyze the promoter architecture diversity of the whole sHSP family, focusing on the identification of HSEs. RESULTS We performed a search for conserved genomic sequences in the promoter regions of the sHSPs of tomato, plus several other proteins (mainly HSPs) that are functionally related to heat stress situations or to ripening. Several computational analyses were performed to build multiple sequence motifs and identify transcription factor binding sites (TFBS) homologous to HSF1AE and HSF21 in Arabidopsis. We also investigated the expression and interaction of these proteins under two heat stress situations in whole tomato plants and in protoplast cells, both in the presence and in the absence of heat shock transcription factor A2 (HsfA2). The results of these analyses indicate that different sHSPs are up-regulated depending on the activation or repression of HsfA2, a key regulator of HSPs. Further, the analysis of protein-protein interaction between the sHSP protein family and other heat shock response proteins (Hsp70, Hsp90 and MBF1c) suggests that several sHSPs are mediating alternative stress response through a regulatory subnetwork that is not dependent on HsfA2. CONCLUSIONS Overall, this study identifies two regulatory motifs (HSF1AE and HSF21) associated with the sHSP family in tomato which are considered genomic HSEs. The study also suggests that, despite the apparent redundancy of these proteins, which has been linked to gene duplication, tomato sHSPs showed different up-regulation and different interaction patterns when analyzed under different stress situations.
Collapse
Affiliation(s)
- Debora Arce
- IICAR-CONICET, Facultad de Ciencias Agrarias, Universidad Nacional de Rosario, Campo Experimental Villarino, Zavalla, S2125ZAA Argentina
| | - Flavio Spetale
- CIFASIS - CONICET, Ocampo y Esmeralda, Rosario, S2000EZP Argentina
| | | | - Paolo Cacchiarelli
- IICAR-CONICET, Facultad de Ciencias Agrarias, Universidad Nacional de Rosario, Campo Experimental Villarino, Zavalla, S2125ZAA Argentina
| | - Javier De Las Rivas
- Cancer Research Center CiC-IBMCC, CSIC/USAL, Campus Miguel de Unamuno s/n, Salamanca, 37007 Spain
| | - Sergio Ponce
- GADIB-FRSN-UTN, Colon 332, San Nicolas, B2900LWH Argentina
| | - Guillermo Pratta
- IICAR-CONICET, Facultad de Ciencias Agrarias, Universidad Nacional de Rosario, Campo Experimental Villarino, Zavalla, S2125ZAA Argentina
| | - Elizabeth Tapia
- CIFASIS - CONICET, Ocampo y Esmeralda, Rosario, S2000EZP Argentina
- Faculty of Exact Sciences, Engineering and Surveying, Av. Pellegrini 250, Rosario, S2000BTP Argentina
| |
Collapse
|
14
|
Nielsen MM, Tataru P, Madsen T, Hobolth A, Pedersen JS. Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments. Algorithms Mol Biol 2018; 13:17. [PMID: 30555524 PMCID: PMC6286601 DOI: 10.1186/s13015-018-0135-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 12/01/2018] [Indexed: 12/23/2022] Open
Abstract
Background Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool. Methods We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems. Results We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity. Conclusions Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex). Electronic supplementary material The online version of this article (10.1186/s13015-018-0135-2) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Vitkin E, Solomon O, Sultan S, Yakhini Z. Genome-wide analysis of fitness data and its application to improve metabolic models. BMC Bioinformatics 2018; 19:368. [PMID: 30305012 PMCID: PMC6180484 DOI: 10.1186/s12859-018-2341-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 08/28/2018] [Indexed: 11/17/2022] Open
Abstract
Background Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence. Results We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction. Conclusion Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data. Electronic supplementary material The online version of this article (10.1186/s12859-018-2341-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Edward Vitkin
- Department of Computer Science, Technion, Haifa, Israel
| | - Oz Solomon
- Faculty of Biotechnology and Food Engineering, Technion, Haifa, Israel. .,School of Computer Science, The Interdisciplinary Center, Herzliya, Israel.
| | - Sharon Sultan
- School of Computer Science, The Interdisciplinary Center, Herzliya, Israel
| | - Zohar Yakhini
- Department of Computer Science, Technion, Haifa, Israel. .,School of Computer Science, The Interdisciplinary Center, Herzliya, Israel.
| |
Collapse
|
16
|
Sasse A, Laverty KU, Hughes TR, Morris QD. Motif models for RNA-binding proteins. Curr Opin Struct Biol 2018; 53:115-123. [PMID: 30172081 DOI: 10.1016/j.sbi.2018.08.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 08/07/2018] [Indexed: 01/24/2023]
Abstract
Identifying the binding preferences of RNA-binding proteins (RBPs) is important in understanding their contribution to post-transcriptional regulation. Here, we review the current state-of-the art of RNA motif identification tools for RBPs. New in vivo and in vitro data sets provide sufficient statistical power to enable detection of relatively long and complex sequence and sequence-structure binding preferences, and recent computational methods are geared towards quantitative identification of these patterns. We classify methods by their motif model's representational power and describe the underlying considerations for RNA-protein interactions. All classical motif identification algorithms apply physically motivated architectures, consisting of a motif and an occupancy model, we call these explicit motif models. Recent methods, such as convolutional neural networks and support vector machines, abandon the classical architecture and implicitly model RNA binding without defining a motif model. Although they achieve high accuracy on held-out data they may be unsuitable to solve the ultimate goal of the field, using motifs trained on in vitro data to predict in vivo binding sites. For this task methods need to separate intrinsic binding preferences from cellular effects from protein and RNA concentrations, cooperativity, and competition. To tackle this problem, we advocate for the use of a `three-layer' architecture, consisting of motif model, occupancy model, and extrinsic factor model, which enables separation and adjustment to cellular conditions.
Collapse
Affiliation(s)
- Alexander Sasse
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Kaitlin U Laverty
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Timothy R Hughes
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Canadian Institute for Advanced Research, MaRS Centre, West Tower, 661 University Avenue, Suite 505, Toronto, ON M5G 1M1, Canada
| | - Quaid D Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5T 3A1, Canada
| |
Collapse
|
17
|
Lavallée-Adam M, Cloutier P, Coulombe B, Blanchette M. Functional 5' UTR motif discovery with LESMoN: Local Enrichment of Sequence Motifs in biological Networks. Nucleic Acids Res 2017; 45:10415-10427. [PMID: 28977652 PMCID: PMC5737372 DOI: 10.1093/nar/gkx751] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 08/17/2017] [Indexed: 01/09/2023] Open
Abstract
Biological networks are rich representations of the relationships between entities such as genes or proteins and have become increasingly complete thanks to various high-throughput network mapping experimental approaches. Here, we propose a method to use such networks to guide the search for functional sequence motifs. Specifically, we introduce Local Enrichment of Sequence Motifs in biological Networks (LESMoN), an enumerative motif discovery algorithm that identifies 5' untranslated region (UTR) sequence motifs whose associated proteins form unexpectedly dense clusters in a given biological network. When applied to the human protein-protein interaction network from BioGRID, LESMoN identifies several highly significant 5' UTR sequence motifs, including both previously known motifs and uncharacterized ones. The vast majority of these motifs are evolutionary conserved and the genes containing them are significantly enriched for various gene ontology terms suggesting new associations between 5' UTR motifs and a number of biological processes. We validate in vivo the role in protein expression regulation of three motifs identified by LESMoN.
Collapse
Affiliation(s)
- Mathieu Lavallée-Adam
- McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, Québec H3A 0E9, Canada.,Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada
| | - Philippe Cloutier
- Translational Proteomics Laboratory, Institut de recherches cliniques de Montréal, Montréal, Québec H2W 1R7, Canada
| | - Benoit Coulombe
- Translational Proteomics Laboratory, Institut de recherches cliniques de Montréal, Montréal, Québec H2W 1R7, Canada.,Département de biochimie et médecine moléculaire, Université de Montréal, Montréal, Québec H3C 3J7, Canada
| | - Mathieu Blanchette
- McGill Centre for Bioinformatics and School of Computer Science, McGill University, Montréal, Québec H3A 0E9, Canada
| |
Collapse
|
18
|
Levy L, Anavy L, Solomon O, Cohen R, Brunwasser-Meirom M, Ohayon S, Atar O, Goldberg S, Yakhini Z, Amit R. A Synthetic Oligo Library and Sequencing Approach Reveals an Insulation Mechanism Encoded within Bacterial σ 54 Promoters. Cell Rep 2017; 21:845-858. [DOI: 10.1016/j.celrep.2017.09.063] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Revised: 08/30/2017] [Accepted: 09/18/2017] [Indexed: 10/18/2022] Open
|
19
|
Identification and characterization of roles for Puf1 and Puf2 proteins in the yeast response to high calcium. Sci Rep 2017; 7:3037. [PMID: 28596535 PMCID: PMC5465220 DOI: 10.1038/s41598-017-02873-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 04/19/2017] [Indexed: 12/12/2022] Open
Abstract
Members of the yeast family of PUF proteins bind unique subsets of mRNA targets that encode proteins with common functions. They therefore became a paradigm for post-transcriptional gene control. To provide new insights into the roles of the seemingly redundant Puf1 and Puf2 members, we monitored the growth rates of their deletions under many different stress conditions. A differential effect was observed at high CaCl2 concentrations, whereby puf1Δ growth was affected much more than puf2Δ, and inhibition was exacerbated in puf1Δpuf2Δ double knockout. Transcriptome analyses upon CaCl2 application for short and long terms defined the transcriptional response to CaCl2 and revealed distinct expression changes for the deletions. Intriguingly, mRNAs known to be bound by Puf1 or Puf2 were affected mainly in the double knockout. We focused on the cell wall regulator Zeo1 and observed that puf1Δpuf2Δ fails to maintain low levels of its mRNA. Complementarily, puf1Δpuf2Δ growth defect in CaCl2 was repaired upon further deletion of the Zeo1 gene. Thus, these proteins probably regulate the cell-wall integrity pathway by regulating Zeo1 post-transcriptionally. This work sheds new light on the roles of Puf proteins during the cellular response to environmental stress.
Collapse
|
20
|
Kelil A, Dubreuil B, Levy ED, Michnick SW. Exhaustive search of linear information encoding protein-peptide recognition. PLoS Comput Biol 2017; 13:e1005499. [PMID: 28426660 PMCID: PMC5417721 DOI: 10.1371/journal.pcbi.1005499] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 05/04/2017] [Accepted: 04/04/2017] [Indexed: 11/24/2022] Open
Abstract
High-throughput in vitro methods have been extensively applied to identify linear information that encodes peptide recognition. However, these methods are limited in number of peptides, sequence variation, and length of peptides that can be explored, and often produce solutions that are not found in the cell. Despite the large number of methods developed to attempt addressing these issues, the exhaustive search of linear information encoding protein-peptide recognition has been so far physically unfeasible. Here, we describe a strategy, called DALEL, for the exhaustive search of linear sequence information encoded in proteins that bind to a common partner. We applied DALEL to explore binding specificity of SH3 domains in the budding yeast Saccharomyces cerevisiae. Using only the polypeptide sequences of SH3 domain binding proteins, we succeeded in identifying the majority of known SH3 binding sites previously discovered either in vitro or in vivo. Moreover, we discovered a number of sites with both non-canonical sequences and distinct properties that may serve ancillary roles in peptide recognition. We compared DALEL to a variety of state-of-the-art algorithms in the blind identification of known binding sites of the human Grb2 SH3 domain. We also benchmarked DALEL on curated biological motifs derived from the ELM database to evaluate the effect of increasing/decreasing the enrichment of the motifs. Our strategy can be applied in conjunction with experimental data of proteins interacting with a common partner to identify binding sites among them. Yet, our strategy can also be applied to any group of proteins of interest to identify enriched linear motifs or to exhaustively explore the space of linear information encoded in a polypeptide sequence. Finally, we have developed a webserver located at http://michnick.bcm.umontreal.ca/dalel, offering user-friendly interface and providing different scenarios utilizing DALEL. Here we describe the first strategy for the exhaustive search of the linear information encoding protein-peptide recognition; an approach that has previously been physically unfeasible because the combinatorial space of polypeptide sequences is too vast. The search covers the entire space of sequences with no restriction on motif length or composition, and includes all possible combinations of amino acids at distinct positions of each sequence, as well as positions with correlated preferences for amino acids.
Collapse
Affiliation(s)
- Abdellali Kelil
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Stephen W. Michnick
- Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
21
|
Zhou Q, Hahn JK, Neupane B, Aidery P, Labeit S, Gawaz M, Gramlich M. Dysregulated IER3 Expression is Associated with Enhanced Apoptosis in Titin-Based Dilated Cardiomyopathy. Int J Mol Sci 2017; 18:E723. [PMID: 28353642 PMCID: PMC5412309 DOI: 10.3390/ijms18040723] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 03/02/2017] [Accepted: 03/24/2017] [Indexed: 12/22/2022] Open
Abstract
Apoptosis (type I programmed cell death) of cardiomyocytes is a major process that plays a role in the progression of heart failure. The early response gene IER3 regulates apoptosis in a wide variety of cells and organs. However, its role in heart failure is largely unknown. Here, we investigate the role of IER3 in an inducible heart failure mouse model. Heart failure was induced in a mouse model that imitates a human titin truncation mutation we found in a patient with dilated cardiomyopathy (DCM). Transferase dUTP nick end labeling (TUNEL) and ssDNA stainings showed induction of apoptosis in titin-deficient cardiomyocytes during heart failure development, while IER3 response was dysregulated. Chromatin immunoprecipitation and knock-down experiments revealed that IER3 proteins target the promotors of anti-apoptotic genes and act as an anti-apoptotic factor in cardiomyocytes. Its expression is blunted during heart failure development in a titin-deficient mouse model. Targeting the IER3 pathway to reduce cardiac apoptosis might be an effective therapeutic strategy to combat heart failure.
Collapse
Affiliation(s)
- Qifeng Zhou
- Department of Cardiology and Cardiovascular Diseases, Eberhard Karls University, 72076 Tübingen, Germany.
| | - Julia Kelley Hahn
- Department of Cardiology and Cardiovascular Diseases, Eberhard Karls University, 72076 Tübingen, Germany.
| | - Balram Neupane
- Department of Cardiology and Cardiovascular Diseases, Eberhard Karls University, 72076 Tübingen, Germany.
| | - Parwez Aidery
- Department of Cardiology and Cardiovascular Diseases, Eberhard Karls University, 72076 Tübingen, Germany.
| | - Siegfried Labeit
- Institute for Integrative Pathophysiology, Universitätsmedizin Mannheim, 68167 Mannheim, Germany.
| | - Meinrad Gawaz
- Department of Cardiology and Cardiovascular Diseases, Eberhard Karls University, 72076 Tübingen, Germany.
| | - Michael Gramlich
- Department of Cardiology and Cardiovascular Diseases, Eberhard Karls University, 72076 Tübingen, Germany.
| |
Collapse
|
22
|
Polishchuk M, Paz I, Kohen R, Mesika R, Yakhini Z, Mandel-Gutfreund Y. A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data. Methods 2017; 118-119:73-81. [PMID: 28274760 DOI: 10.1016/j.ymeth.2017.03.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Revised: 02/28/2017] [Accepted: 03/03/2017] [Indexed: 01/08/2023] Open
Abstract
RNA binding proteins (RBPs) play an important role in regulating many processes in the cell. RBPs often recognize their RNA targets in a specific manner. In addition to the RNA primary sequence, the structure of the RNA has been shown to play a central role in RNA recognition by RBPs. In recent years, many experimental approaches, both in vitro and in vivo, were developed and employed to identify and characterize RBP targets and extract their binding specificities. In vivo binding techniques, such as CrossLinking and ImmunoPrecipitation (CLIP)-based methods, enable the characterization of protein binding sites on RNA targets. However, these methods do not provide information regarding the structural preferences of the protein. While methods to obtain the structure of RNA are available, inferring both the sequence and the structure preferences of RBPs remains a challenge. Here we present SMARTIV, a novel computational tool for discovering combined sequence and structure binding motifs from in vivo RNA binding data relying on the sequences of the target sites, the ranking of their binding scores and their predicted secondary structure. The combined motifs are provided in a unified representation that is informative and easy for visual perception. We tested the method on CLIP-seq data from different platforms for a variety of RBPs. Overall, we show that our results are highly consistent with known binding motifs of RBPs, offering additional information on their structural preferences.
Collapse
Affiliation(s)
- Maya Polishchuk
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel; Vavilov Institute of General Genetics, Russian Academy of Science, Moscow 11933, Russia
| | - Inbal Paz
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Refael Kohen
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Rona Mesika
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Zohar Yakhini
- Faculty of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel; School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 46150, Israel
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa 32000, Israel; Faculty of Computer Science, Technion-Israel Institute of Technology, Haifa 32000, Israel.
| |
Collapse
|
23
|
Pan X, Shen HB. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics 2017; 18:136. [PMID: 28245811 PMCID: PMC5331642 DOI: 10.1186/s12859-017-1561-8] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 02/23/2017] [Indexed: 01/08/2023] Open
Abstract
Background RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. Results In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. Conclusion The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1561-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyong Pan
- Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China.
| |
Collapse
|
24
|
Geffen Y, Appleboim A, Gardner RG, Friedman N, Sadeh R, Ravid T. Mapping the Landscape of a Eukaryotic Degronome. Mol Cell 2016; 63:1055-65. [PMID: 27618491 DOI: 10.1016/j.molcel.2016.08.005] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 07/11/2016] [Accepted: 08/02/2016] [Indexed: 12/16/2022]
Abstract
The ubiquitin-proteasome system (UPS) for protein degradation has been under intensive study, and yet, we have only partial understanding of mechanisms by which proteins are selected to be targeted for proteolysis. One of the obstacles in studying these recognition pathways is the limited repertoire of known degradation signals (degrons). To better understand what determines the susceptibility of intracellular proteins to degradation by the UPS, we developed an unbiased method for large-scale identification of eukaryotic degrons. Using a reporter-based high-throughput competition assay, followed by deep sequencing, we measured a degradation potency index for thousands of native polypeptides in a single experiment. We further used this method to identify protein quality control (PQC)-specific and compartment-specific degrons. Our method provides an unprecedented insight into the yeast degronome, and it can readily be modified to study protein degradation signals and pathways in other organisms and in various settings.
Collapse
Affiliation(s)
- Yifat Geffen
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Alon Appleboim
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel; School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Richard G Gardner
- Department of Pharmacology, University of Washington, Seattle, WA 98195, USA
| | - Nir Friedman
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel; School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel.
| | - Ronen Sadeh
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel; School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel.
| | - Tommer Ravid
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel.
| |
Collapse
|
25
|
Giudice G, Sánchez-Cabo F, Torroja C, Lara-Pezzi E. ATtRACT-a database of RNA-binding proteins and associated motifs. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw035. [PMID: 27055826 PMCID: PMC4823821 DOI: 10.1093/database/baw035] [Citation(s) in RCA: 150] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 03/01/2016] [Indexed: 12/21/2022]
Abstract
RNA-binding proteins (RBPs) play a crucial role in key cellular processes, including RNA transport, splicing, polyadenylation and stability. Understanding the interaction between RBPs and RNA is key to improve our knowledge of RNA processing, localization and regulation in a global manner. Despite advances in recent years, a unified non-redundant resource that includes information on experimentally validated motifs, RBPs and integrated tools to exploit this information is lacking. Here, we developed a database named ATtRACT (available athttp://attract.cnic.es) that compiles information on 370 RBPs and 1583 RBP consensus binding motifs, 192 of which are not present in any other database. To populate ATtRACT we (i) extracted and hand-curated experimentally validated data from CISBP-RNA, SpliceAid-F, RBPDB databases, (ii) integrated and updated the unavailable ASD database and (iii) extracted information from Protein-RNA complexes present in Protein Data Bank database through computational analyses. ATtRACT provides also efficient algorithms to search a specific motif and scan one or more RNA sequences at a time. It also allows discoveringde novomotifs enriched in a set of related sequences and compare them with the motifs included in the database.Database URL:http:// attract. cnic. es.
Collapse
Affiliation(s)
- Girolamo Giudice
- Centro Nacional de Investigaciones Cardiovasculares Carlos III, Melchor Fernández Almagro 3, Madrid 28029, Spain
| | | | - Carlos Torroja
- Bioinformatics Unit, Centro Nacional de Investigaciones Cardiovasculares, Melchor Fernández Almagro 3, Madrid 28029, Spain
| | - Enrique Lara-Pezzi
- Centro Nacional de Investigaciones Cardiovasculares Carlos III, Melchor Fernández Almagro 3, Madrid 28029, Spain National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
26
|
Tangirala K, Herndon N, Caragea D. A Comparative Analysis Between k-Mers and Community Detection-Based Features for the Task of Protein Classification. IEEE Trans Nanobioscience 2016; 15:84-92. [PMID: 26863669 PMCID: PMC6245644 DOI: 10.1109/tnb.2016.2523501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Machine learning algorithms are widely used to annotate biological sequences. Low-dimensional informative feature vectors can be crucial for the performance of the algorithms. In prior work, we have proposed the use of a community detection approach to construct low dimensional feature sets for nucleotide sequence classification. Our approach used the Hamming distance between short nucleotide subsequences, called k-mers, to construct a network, and subsequently used community detection to identify groups of k -mers that appear frequently in a set of sequences. Whereas this approach worked well for nucleotide sequence classification, it could not be directly used for protein sequences, as the Hamming distance is not a good measure for comparing short protein k-mers. To address this limitation, we extended our prior approach by replacing the Hamming distance with substitution scores. Experimental results in different learning scenarios show that the features generated with the new approach are more informative than k-mers.
Collapse
|
27
|
RNA Bioinformatics for Precision Medicine. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 939:21-38. [DOI: 10.1007/978-981-10-1503-8_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
28
|
Faigenbloom L, Rubinstein ND, Kloog Y, Mayrose I, Pupko T, Stein R. Regulation of alternative splicing at the single-cell level. Mol Syst Biol 2015; 11:845. [PMID: 26712315 PMCID: PMC4704489 DOI: 10.15252/msb.20156278] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Alternative splicing is a key cellular mechanism for generating distinct isoforms, whose relative abundances regulate critical cellular processes. It is therefore essential that inclusion levels of alternative exons be tightly regulated. However, how the precision of inclusion levels among individual cells is governed is poorly understood. Using single-cell gene expression, we show that the precision of inclusion levels of alternative exons is determined by the degree of evolutionary conservation at their flanking intronic regions. Moreover, the inclusion levels of alternative exons, as well as the expression levels of the transcripts harboring them, also contribute to this precision. We further show that alternative exons whose inclusion levels are considerably changed during stem cell differentiation are also subject to this regulation. Our results imply that alternative splicing is coordinately regulated to achieve accuracy in relative isoform abundances and that such accuracy may be important in determining cell fate.
Collapse
Affiliation(s)
- Lior Faigenbloom
- The Department of Neurobiology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Nimrod D Rubinstein
- The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yoel Kloog
- The Department of Neurobiology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Itay Mayrose
- The Department of Molecular Biology and Ecology of Plants, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- The Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Reuven Stein
- The Department of Neurobiology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
29
|
Eshar S, Altenhofen L, Rabner A, Ross P, Fastman Y, Mandel-Gutfreund Y, Karni R, Llinás M, Dzikowski R. PfSR1 controls alternative splicing and steady-state RNA levels in Plasmodium falciparum through preferential recognition of specific RNA motifs. Mol Microbiol 2015; 96:1283-97. [PMID: 25807998 DOI: 10.1111/mmi.13007] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2015] [Indexed: 11/28/2022]
Abstract
Plasmodium species have evolved complex biology to adapt to different hosts and changing environments throughout their life cycle. Remarkably, these adaptations are achieved by a relatively small genome. One way by which the parasite expands its proteome is through alternative splicing (AS). We recently identified PfSR1 as a bona fide Ser/Arg-rich (SR) protein that shuttles between the nucleus and cytoplasm and regulates AS in Plasmodium falciparum. Here we show that PfSR1 is localized adjacent to the Nuclear Pore Complex (NPC) clusters in the nucleus of early stage parasites. To identify the endogenous RNA targets of PfSR1, we adapted an inducible overexpression system for tagged PfSR1 and performed RNA immunoprecipitation followed by microarray analysis (RIP-chip) to recover and identify the endogenous RNA targets that bind PfSR1. Bioinformatic analysis of these RNAs revealed common sequence motifs potentially recognized by PfSR1. RNA-EMSAs show that PfSR1 preferentially binds RNA molecules containing these motifs. Interestingly, we find that PfSR1 not only regulates AS but also the steady-state levels of mRNAs containing these motifs in vivo.
Collapse
Affiliation(s)
- Shiri Eshar
- Department of Microbiology and Molecular Genetics, The Kuvin Center for the Study of Infectious and Tropical Diseases, The Institute for Medical Research Israel-Canada, The Hebrew University - Hadassah Medical School, Jerusalem, Israel
| | - Lindsey Altenhofen
- Department of Molecular Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA.,Department of Biochemistry and Molecular Biology, Department of Chemistry and Center for Malaria Research, Pennsylvania State University, State College, PA, 16802, USA
| | - Alona Rabner
- Department of Biology, Israel Institute of Technology-Technion, Haifa, Israel
| | - Phil Ross
- Department of Biochemistry and Molecular Biology, Department of Chemistry and Center for Malaria Research, Pennsylvania State University, State College, PA, 16802, USA
| | - Yair Fastman
- Department of Microbiology and Molecular Genetics, The Kuvin Center for the Study of Infectious and Tropical Diseases, The Institute for Medical Research Israel-Canada, The Hebrew University - Hadassah Medical School, Jerusalem, Israel
| | | | - Rotem Karni
- Department of Biochemistry and Molecular Biology, The Institute for Medical Research Israel-Canada, The Hebrew University - Hadassah Medical School, Jerusalem, Israel
| | - Manuel Llinás
- Department of Molecular Biology and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA.,Department of Biochemistry and Molecular Biology, Department of Chemistry and Center for Malaria Research, Pennsylvania State University, State College, PA, 16802, USA
| | - Ron Dzikowski
- Department of Microbiology and Molecular Genetics, The Kuvin Center for the Study of Infectious and Tropical Diseases, The Institute for Medical Research Israel-Canada, The Hebrew University - Hadassah Medical School, Jerusalem, Israel
| |
Collapse
|
30
|
|
31
|
Molecular characterization of Plasmodium falciparum Bruno/CELF RNA binding proteins. Mol Biochem Parasitol 2014; 198:1-10. [DOI: 10.1016/j.molbiopara.2014.10.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 10/13/2014] [Accepted: 10/22/2014] [Indexed: 01/04/2023]
|
32
|
Paz I, Kosti I, Ares M, Cline M, Mandel-Gutfreund Y. RBPmap: a web server for mapping binding sites of RNA-binding proteins. Nucleic Acids Res 2014; 42:W361-7. [PMID: 24829458 PMCID: PMC4086114 DOI: 10.1093/nar/gku406] [Citation(s) in RCA: 336] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Regulation of gene expression is executed in many cases by RNA-binding proteins
(RBPs) that bind to mRNAs as well as to non-coding RNAs. RBPs recognize their
RNA target via specific binding sites on the RNA. Predicting the binding sites
of RBPs is known to be a major challenge. We present a new webserver, RBPmap,
freely accessible through the website http://rbpmap.technion.ac.il/ for accurate prediction and
mapping of RBP binding sites. RBPmap has been developed specifically for mapping
RBPs in human, mouse and Drosophila melanogaster genomes,
though it supports other organisms too. RBPmap enables the users to select
motifs from a large database of experimentally defined motifs. In addition,
users can provide any motif of interest, given as either a consensus or a PSSM.
The algorithm for mapping the motifs is based on a Weighted-Rank approach, which
considers the clustering propensity of the binding sites and the overall
tendency of regulatory regions to be conserved. In addition, RBPmap incorporates
a position-specific background model, designed uniquely for different genomic
regions, such as splice sites, 5’ and 3’ UTRs, non-coding RNA
and intergenic regions. RBPmap was tested on high-throughput RNA-binding
experiments and was proved to be highly accurate.
Collapse
Affiliation(s)
- Inbal Paz
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Idit Kosti
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
| | - Manuel Ares
- Department of Molecular, Cellular and Developmental Biology, UCSC, Santa Cruz, CA, USA
| | - Melissa Cline
- Center for Biomolecular Science & Engineering, UCSC, Santa Cruz, CA, USA
| | - Yael Mandel-Gutfreund
- Department of Biology, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
33
|
Backofen R, Vogel T. Biological and bioinformatical approaches to study crosstalk of long-non-coding RNAs and chromatin-modifying proteins. Cell Tissue Res 2014; 356:507-26. [PMID: 24820400 DOI: 10.1007/s00441-014-1885-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Accepted: 03/27/2014] [Indexed: 02/04/2023]
Abstract
Long-non-coding RNA (lncRNA) regulates gene expression through transcriptional and epigenetic regulation as well as alternative splicing in the nucleus. In addition, regulation is achieved at the levels of mRNA translation, storage and degradation in the cytoplasm. During recent years, several studies have described the interaction of lncRNAs with enzymes that confer so-called epigenetic modifications, such as DNA methylation, histone modifications and chromatin structure or remodelling. LncRNA interaction with chromatin-modifying enzymes (CME) is an emerging field that confers another layer of complexity in transcriptional regulation. Given that CME-lncRNA interactions have been identified in many biological processes, ranging from development to disease, comprehensive understanding of underlying mechanisms is important to inspire basic and translational research in the future. In this review, we highlight recent findings to extend our understanding about the functional interdependencies between lncRNAs and CMEs that activate or repress gene expression. We focus on recent highlights of molecular and functional roles for CME-lncRNAs and provide an interdisciplinary overview of recent technical and methodological developments that have improved biological and bioinformatical approaches for detection and functional studies of CME-lncRNA interaction.
Collapse
Affiliation(s)
- Rolf Backofen
- Institute of Computer Science, Albert-Ludwigs-University, Freiburg, Germany
| | | |
Collapse
|
34
|
Ma Q, Zhang H, Mao X, Zhou C, Liu B, Chen X, Xu Y. DMINDA: an integrated web server for DNA motif identification and analyses. Nucleic Acids Res 2014; 42:W12-9. [PMID: 24753419 PMCID: PMC4086085 DOI: 10.1093/nar/gku315] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
DMINDA (DNA motif identification and
analyses) is an integrated web server for DNA motif identification
and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely
available to all users and there is no login requirement. This server provides a
suite of cis-regulatory motif analysis functions on DNA
sequences, which are important to elucidation of the mechanisms of
transcriptional regulation: (i) de novo motif finding for a
given set of promoter sequences along with statistical scores for the predicted
motifs derived based on information extracted from a control set, (ii) scanning
motif instances of a query motif in provided genomic sequences, (iii) motif
comparison and clustering of identified motifs, and (iv) co-occurrence analyses
of query motifs in given promoter sequences. The server is powered by a backend
computer cluster with over 150 computing nodes, and is particularly useful for
motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as
a new and comprehensive web server for cis-regulatory motif
finding and analyses, will benefit the genomic research community in general and
prokaryotic genome researchers in particular.
Collapse
Affiliation(s)
- Qin Ma
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens GA 30602, USA BioEnergy Science Center (BESC), Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Hanyuan Zhang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens GA 30602, USA College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xizeng Mao
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens GA 30602, USA BioEnergy Science Center (BESC), Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Chuan Zhou
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Xin Chen
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens GA 30602, USA College of Computer Science and Technology, Jilin University, Changchun, China
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, University of Georgia, Athens GA 30602, USA BioEnergy Science Center (BESC), Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
35
|
Leibovich L, Yakhini Z. Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs. Algorithms Mol Biol 2014; 9:11. [PMID: 24708618 PMCID: PMC4021615 DOI: 10.1186/1748-7188-9-11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Accepted: 03/30/2014] [Indexed: 11/18/2022] Open
Abstract
Background Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences. More flexible models such as position weight matrix (PWM) motifs are more challenging in this context, partially because it is not clear how to avoid the use of arbitrary thresholds. Results To assess the enrichment of a PWM motif in a ranked list we use a second ranking on the same set of elements induced by the PWM. Possible orders of one ranked list relative to another can be modelled as permutations. Due to sample space complexity, it is difficult to accurately characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top parts of two uniformly and independently drawn permutations. We further demonstrate advantages of this approach using our software implementation, mmHG-Finder, which is publicly available, to study PWM motifs in several datasets. In addition to validating known motifs, we found GC-rich strings to be enriched amongst the promoter sequences of long non-coding RNAs that are specifically expressed in thyroid and prostate tissue samples and observed a statistical association with tissue specific CpG hypo-methylation. Conclusions We develop tight bounds that can be calculated in polynomial time. We demonstrate utility of mutual enrichment in motif search and assess performance for synthetic and biological datasets. We suggest that thyroid and prostate-specific long non-coding RNAs are regulated by transcription factors that bind GC-rich sequences, such as EGR1, SP1 and E2F3. We further suggest that this regulation is associated with DNA hypo-methylation.
Collapse
|
36
|
Beier R, Boschke E, Labudde D. New strategies for evaluation and analysis of SELEX experiments. BIOMED RESEARCH INTERNATIONAL 2014; 2014:849743. [PMID: 24779017 PMCID: PMC3977542 DOI: 10.1155/2014/849743] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/28/2014] [Indexed: 12/04/2022]
Abstract
Aptamers are an interesting alternative to antibodies in pharmaceutics and biosensorics, because they are able to bind to a multitude of possible target molecules with high affinity. Therefore the process of finding such aptamers, which is commonly a SELEX screening process, becomes crucial. The standard SELEX procedure schedules the validation of certain found aptamers via binding experiments, which is not leading to any detailed specification of the aptamer enrichment during the screening. For the purpose of advanced analysis of the accrued enrichment within the SELEX library we used sequence information gathered by next generation sequencing techniques in addition to the standard SELEX procedure. As sequence motifs are one possibility of enrichment description, the need of finding those recurring sequence motifs corresponding to substructures within the aptamers, which are characteristically fitted to specific binding sites of the target, arises. In this paper a motif search algorithm is presented, which helps to describe the aptamers enrichment in more detail. The extensive characterization of target and binding aptamers may later reveal a functional connection between these molecules, which can be modeled and used to optimize future SELEX runs in case of the generation of target-specific starting libraries.
Collapse
Affiliation(s)
- Rico Beier
- Bioinformatics Group, Department of Mathematics, Natural and Computer Sciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
| | - Elke Boschke
- Institute of Food Technology and Bioprocess Engineering, Department of Mechanical Engineering, Dresden University of Technology, 01062 Dresden, Germany
| | - Dirk Labudde
- Bioinformatics Group, Department of Mathematics, Natural and Computer Sciences, University of Applied Sciences Mittweida, 09648 Mittweida, Germany
| |
Collapse
|
37
|
Maticzka D, Lange SJ, Costa F, Backofen R. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol 2014; 15:R17. [PMID: 24451197 PMCID: PMC4053806 DOI: 10.1186/gb-2014-15-1-r17] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 01/22/2014] [Indexed: 12/01/2022] Open
Abstract
We present GraphProt, a computational framework for learning sequence- and structure-binding preferences of RNA-binding proteins (RBPs) from high-throughput experimental data. We benchmark GraphProt, demonstrating that the modeled binding preferences conform to the literature, and showcase the biological relevance and two applications of GraphProt models. First, estimated binding affinities correlate with experimental measurements. Second, predicted Ago2 targets display higher levels of expression upon Ago2 knockdown, whereas control targets do not. Computational binding models, such as those provided by GraphProt, are essential for predicting RBP binding sites and affinities in all tissues. GraphProt is freely available at http://www.bioinf.uni-freiburg.de/Software/GraphProt.
Collapse
|