1
|
Lang B, Yang JS, Garriga-Canut M, Speroni S, Aschern M, Gili M, Hoffmann T, Tartaglia GG, Maurer SP. Matrix-screening reveals a vast potential for direct protein-protein interactions among RNA binding proteins. Nucleic Acids Res 2021; 49:6702-6721. [PMID: 34133714 PMCID: PMC8266617 DOI: 10.1093/nar/gkab490] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 04/23/2021] [Accepted: 05/20/2021] [Indexed: 01/02/2023] Open
Abstract
RNA-binding proteins (RBPs) are crucial factors of post-transcriptional gene regulation and their modes of action are intensely investigated. At the center of attention are RNA motifs that guide where RBPs bind. However, sequence motifs are often poor predictors of RBP-RNA interactions in vivo. It is hence believed that many RBPs recognize RNAs as complexes, to increase specificity and regulatory possibilities. To probe the potential for complex formation among RBPs, we assembled a library of 978 mammalian RBPs and used rec-Y2H matrix screening to detect direct interactions between RBPs, sampling > 600 K interactions. We discovered 1994 new interactions and demonstrate that interacting RBPs bind RNAs adjacently in vivo. We further find that the mRNA binding region and motif preferences of RBPs deviate, depending on their adjacently binding interaction partners. Finally, we reveal novel RBP interaction networks among major RNA processing steps and show that splicing impairing RBP mutations observed in cancer rewire spliceosomal interaction networks. The dataset we provide will be a valuable resource for understanding the combinatorial interactions of RBPs with RNAs and the resulting regulatory outcomes.
Collapse
Affiliation(s)
- Benjamin Lang
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Doctor Aiguader 88, Barcelona 08003, Spain.,Department of Structural Biology and Center of Excellence for Data-Driven Discovery, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Jae-Seong Yang
- Centre de Recerca en Agrigenòmica, Consortium CSIC-IRTA-UAB-UB (CRAG), Cerdanyola del Vallès, 08193 Barcelona, Spain
| | - Mireia Garriga-Canut
- Division of Engineering, New York University Abu Dhabi (NYUAD), Abu Dhabi 129188, UAE
| | - Silvia Speroni
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Doctor Aiguader 88, Barcelona 08003, Spain
| | - Moritz Aschern
- Centre de Recerca en Agrigenòmica, Consortium CSIC-IRTA-UAB-UB (CRAG), Cerdanyola del Vallès, 08193 Barcelona, Spain
| | - Maria Gili
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Doctor Aiguader 88, Barcelona 08003, Spain
| | - Tobias Hoffmann
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Doctor Aiguader 88, Barcelona 08003, Spain
| | - Gian Gaetano Tartaglia
- Center for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152, Genoa, Italy.,Biology and Biotechnology Department "Charles Darwin", Sapienza University of Rome, P.le A. Moro 5, Rome 00185, Italy
| | - Sebastian P Maurer
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Doctor Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Department of Experimental and Health Sciences, Barcelona, Spain
| |
Collapse
|
2
|
Armaos A, Zacco E, Sanchez de Groot N, Tartaglia GG. RNA-protein interactions: Central players in coordination of regulatory networks. Bioessays 2020; 43:e2000118. [PMID: 33284474 DOI: 10.1002/bies.202000118] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 12/12/2022]
Abstract
Changes in the abundance of protein and RNA molecules can impair the formation of complexes in the cell leading to toxicity and death. Here we exploit the information contained in protein, RNA and DNA interaction networks to provide a comprehensive view of the regulation layers controlling the concentration-dependent formation of assemblies in the cell. We present the emerging concept that RNAs can act as scaffolds to promote the formation ribonucleoprotein complexes and coordinate the post-transcriptional layer of gene regulation. We describe the structural and interaction network properties that characterize the ability of protein and RNA molecules to interact and phase separate in liquid-like compartments. Finally, we show that presence of structurally disordered regions in proteins correlate with the propensity to undergo liquid-to-solid phase transitions and cause human diseases. Also see the video abstract here https://youtu.be/kfpqibsNfS0.
Collapse
Affiliation(s)
- Alexandros Armaos
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Center for Human Technologies, Istituto Italiano di Tecnologia, Genova, Italy
| | - Elsa Zacco
- Center for Human Technologies, Istituto Italiano di Tecnologia, Genova, Italy
| | - Natalia Sanchez de Groot
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Center for Human Technologies, Istituto Italiano di Tecnologia, Genova, Italy.,Department of Biology 'Charles Darwin', Sapienza University of Rome, Rome, Italy.,Institucio Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
3
|
Zhang Y, Qiao S, Ji S, Li Y. DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00990-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
4
|
Chauhan S, Ahmad S. Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence. Proteins 2019; 88:15-30. [DOI: 10.1002/prot.25763] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/01/2019] [Accepted: 06/15/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Sucheta Chauhan
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| | - Shandar Ahmad
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| |
Collapse
|
5
|
Lee W, Park B, Han K. Sequence-based prediction of putative transcription factor binding sites in DNA sequences of any length. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 15:1461-1469. [PMID: 29990126 DOI: 10.1109/tcbb.2017.2773075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A transcription factor (TF) is a protein that regulates gene expression by binding to specific DNA sequences. Despite the recent advances in experimental techniques for identifying transcription factor binding sites (TFBS) in DNA sequences, a large number of TFBS are to be unveiled in many species. Several computational methods developed for predicting TFBS in DNA are tissue- or species-specific methods, so cannot be used without prior knowledge of tissue or species. Some computational methods are applicable to finding TFBS in short DNA sequences only. In this paper we propose a new learning method for predicting TFBS in DNA of any length using the composition, transition and distribution of nucleotides and amino acids in DNA and TF sequences. In independent testing of the method on datasets that were not used in training the method, its accuracy and MCC were as high as 81.84% and 0.634, respectively. The proposed method can be a useful aid for selecting potential TFBS in a large amount of DNA sequences before conducting biochemical experiments to empirically determine TFBS. The program and data sets are available at http://bclab.inha.ac.kr/TFbinding.
Collapse
|
6
|
Zhang H, Zhu L, Huang DS. WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data. Sci Rep 2017; 7:3217. [PMID: 28607381 PMCID: PMC5468353 DOI: 10.1038/s41598-017-03554-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 05/02/2017] [Indexed: 01/24/2023] Open
Abstract
Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, due to consideration of computational expense, most of existing DMD methods have to choose approximate schemes that greatly restrict the search space, leading to significant loss of predictive accuracy. In this paper, we propose Weakly-Supervised Motif Discovery (WSMD) to discover motifs from ChIP-seq datasets. In contrast to the learning strategies adopted by previous DMD methods, WSMD allows a "global" optimization scheme of the motif parameters in continuous space, thereby reducing the information loss of model representation and improving the quality of resultant motifs. Meanwhile, by exploiting the connection between DMD framework and existing weakly supervised learning (WSL) technologies, we also present highly scalable learning strategies for the proposed method. The experimental results on both real ChIP-seq datasets and synthetic datasets show that WSMD substantially outperforms former DMD methods (including DREME, HOMER, XXmotif, motifRG and DECOD) in terms of predictive accuracy, while also achieving a competitive computational speed.
Collapse
Affiliation(s)
- Hongbo Zhang
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - Lin Zhu
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, 201804, P.R. China.
| |
Collapse
|
7
|
Marchese D, de Groot NS, Lorenzo Gotor N, Livi CM, Tartaglia GG. Advances in the characterization of RNA-binding proteins. WILEY INTERDISCIPLINARY REVIEWS. RNA 2016; 7:793-810. [PMID: 27503141 PMCID: PMC5113702 DOI: 10.1002/wrna.1378] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 06/14/2016] [Accepted: 06/23/2016] [Indexed: 12/14/2022]
Abstract
From transcription, to transport, storage, and translation, RNA depends on association with different RNA-binding proteins (RBPs). Methods based on next-generation sequencing and protein mass-spectrometry have started to unveil genome-wide interactions of RBPs but many aspects still remain out of sight. How many of the binding sites identified in high-throughput screenings are functional? A number of computational methods have been developed to analyze experimental data and to obtain insights into the specificity of protein-RNA interactions. How can theoretical models be exploited to identify RBPs? In addition to oligomeric complexes, protein and RNA molecules can associate into granular assemblies whose physical properties are still poorly understood. What protein features promote granule formation and what effects do these assemblies have on cell function? Here, we describe the newest in silico, in vitro, and in vivo advances in the field of protein-RNA interactions. We also present the challenges that experimental and computational approaches will have to face in future studies. WIREs RNA 2016, 7:793-810. doi: 10.1002/wrna.1378 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Domenica Marchese
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Natalia Sanchez de Groot
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Nieves Lorenzo Gotor
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Carmen Maria Livi
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- IFOM Foundation, FIRC Institute of Molecular Oncology Foundation, Milan, Italy
| | - Gian G Tartaglia
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
8
|
Lin D, Zhang J, Li J, Xu C, Deng HW, Wang YP. An integrative imputation method based on multi-omics datasets. BMC Bioinformatics 2016; 17:247. [PMID: 27329642 PMCID: PMC4915152 DOI: 10.1186/s12859-016-1122-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 06/05/2016] [Indexed: 12/26/2022] Open
Abstract
Background Integrative analysis of multi-omics data is becoming increasingly important to unravel functional mechanisms of complex diseases. However, the currently available multi-omics datasets inevitably suffer from missing values due to technical limitations and various constrains in experiments. These missing values severely hinder integrative analysis of multi-omics data. Current imputation methods mainly focus on using single omics data while ignoring biological interconnections and information imbedded in multi-omics data sets. Results In this study, a novel multi-omics imputation method was proposed to integrate multiple correlated omics datasets for improving the imputation accuracy. Our method was designed to: 1) combine the estimates of missing value from individual omics data itself as well as from other omics, and 2) simultaneously impute multiple missing omics datasets by an iterative algorithm. We compared our method with five imputation methods using single omics data at different noise levels, sample sizes and data missing rates. The results demonstrated the advantage and efficiency of our method, consistently in terms of the imputation error and the recovery of mRNA-miRNA network structure. Conclusions We concluded that our proposed imputation method can utilize more biological information to minimize the imputation error and thus can improve the performance of downstream analysis such as genetic regulatory network construction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1122-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dongdong Lin
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA.,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA
| | - Jingyao Li
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA.,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Chao Xu
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA.,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA.,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA. .,Center for Bioinformatics and Genomics, Tulane University, New Orleans, LA, 70112, USA. .,Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA.
| |
Collapse
|