Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ghanbari M, Ohler U. Deep neural networks for interpreting RNA-binding protein target preferences. Genome Res 2020;30:214-226. [PMID: 31992613 PMCID: PMC7050519 DOI: 10.1101/gr.247494.118] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 01/07/2020] [Indexed: 11/29/2022]

For:	Ghanbari M, Ohler U. Deep neural networks for interpreting RNA-binding protein target preferences. Genome Res 2020;30:214-226. [PMID: 31992613 PMCID: PMC7050519 DOI: 10.1101/gr.247494.118] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 01/07/2020] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Qiao Y, Yang R, Liu Y, Chen J, Zhao L, Huo P, Wang Z, Bu D, Wu Y, Zhao Y. DeepFusion: A deep bimodal information fusion network for unraveling protein-RNA interactions using in vivo RNA structures. Comput Struct Biotechnol J 2024;23:617-625. [PMID: 38274994 PMCID: PMC10808905 DOI: 10.1016/j.csbj.2023.12.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/04/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open

Rakowski A, Monti R, Huryn V, Lemanczyk M, Ohler U, Lippert C. Metadata-guided feature disentanglement for functional genomics. Bioinformatics 2024;40:ii4-ii10. [PMID: 39230700 PMCID: PMC11373386 DOI: 10.1093/bioinformatics/btae403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open

Hervoso JL, Amoah K, Dodson J, Choudhury M, Bhattacharya A, Quinones-Valdez G, Pasaniuc B, Xiao X. Splicing-specific transcriptome-wide association uncovers genetic mechanisms for schizophrenia. Am J Hum Genet 2024;111:1573-1587. [PMID: 38925119 PMCID: PMC11339621 DOI: 10.1016/j.ajhg.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 05/28/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024] Open

Abstract

Recent studies have highlighted the essential role of RNA splicing, a key mechanism of alternative RNA processing, in establishing connections between genetic variations and disease. Genetic loci influencing RNA splicing variations show considerable influence on complex traits, possibly surpassing those affecting total gene expression. Dysregulated RNA splicing has emerged as a major potential contributor to neurological and psychiatric disorders, likely due to the exceptionally high prevalence of alternatively spliced genes in the human brain. Nevertheless, establishing direct associations between genetically altered splicing and complex traits has remained an enduring challenge. We introduce Spliced-Transcriptome-Wide Associations (SpliTWAS) to integrate alternative splicing information with genome-wide association studies to pinpoint genes linked to traits through exon splicing events. We applied SpliTWAS to two schizophrenia (SCZ) RNA-sequencing datasets, BrainGVEX and CommonMind, revealing 137 and 88 trait-associated exons (in 84 and 67 genes), respectively. Enriched biological functions in the associated gene sets converged on neuronal function and development, immune cell activation, and cellular transport, which are highly relevant to SCZ. SpliTWAS variants impacted RNA-binding protein binding sites, revealing potential disruption of RNA-protein interactions affecting splicing. We extended the probabilistic fine-mapping method FOCUS to the exon level, identifying 36 genes and 48 exons as putatively causal for SCZ. We highlight VPS45 and APOPT1, where splicing of specific exons was associated with disease risk, eluding detection by conventional gene expression analysis. Collectively, this study supports the substantial role of alternative splicing in shaping the genetic basis of SCZ, providing a valuable approach for future investigations in this area.

Collapse

Sokolova K, Chen KM, Hao Y, Zhou J, Troyanskaya OG. Deep Learning Sequence Models for Transcriptional Regulation. Annu Rev Genomics Hum Genet 2024;25:105-122. [PMID: 38594933 DOI: 10.1146/annurev-genom-021623-024727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]

Rennie S. Deep Learning for Elucidating Modifications to RNA-Status and Challenges Ahead. Genes (Basel) 2024;15:629. [PMID: 38790258 PMCID: PMC11121098 DOI: 10.3390/genes15050629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/11/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open

Fu T, Amoah K, Chan TW, Bahn JH, Lee JH, Terrazas S, Chong R, Kosuri S, Xiao X. Massively parallel screen uncovers many rare 3' UTR variants regulating mRNA abundance of cancer driver genes. Nat Commun 2024;15:3335. [PMID: 38637555 PMCID: PMC11026479 DOI: 10.1038/s41467-024-46795-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 03/06/2024] [Indexed: 04/20/2024] Open

Affiliation(s)

Ting Fu Molecular, Cellular and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Kofi Amoah Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Tracey W Chan Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Jae Hoon Bahn Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Jae-Hyung Lee Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA Department of Life and Nanopharmaceutical Sciences & Oral Microbiology, School of Dentistry, Kyung Hee University, Seoul, South Korea
Sari Terrazas Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA Molecular Biology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Rockie Chong Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Sriram Kosuri Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Xinshu Xiao Molecular, Cellular and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA. Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA. Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA. Molecular Biology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA. Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA.

Collapse

Wu H, Liu X, Fang Y, Yang Y, Huang Y, Pan X, Shen HB. Decoding protein binding landscape on circular RNAs with base-resolution transformer models. Comput Biol Med 2024;171:108175. [PMID: 38402841 DOI: 10.1016/j.compbiomed.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/16/2024] [Accepted: 02/18/2024] [Indexed: 02/27/2024]

Toussaint PA, Leiser F, Thiebes S, Schlesner M, Brors B, Sunyaev A. Explainable artificial intelligence for omics data: a systematic mapping study. Brief Bioinform 2023;25:bbad453. [PMID: 38113073 PMCID: PMC10729786 DOI: 10.1093/bib/bbad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/28/2023] [Accepted: 11/08/2023] [Indexed: 12/21/2023] Open

Vaculík O, Chalupová E, Grešová K, Majtner T, Alexiou P. Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes. BIOLOGY 2023;12:1276. [PMID: 37886986 PMCID: PMC10604046 DOI: 10.3390/biology12101276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/19/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023]

Horlacher M, Cantini G, Hesse J, Schinke P, Goedert N, Londhe S, Moyon L, Marsico A. A systematic benchmark of machine learning methods for protein-RNA interaction prediction. Brief Bioinform 2023;24:bbad307. [PMID: 37635383 PMCID: PMC10516373 DOI: 10.1093/bib/bbad307] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/15/2023] [Accepted: 07/18/2023] [Indexed: 08/29/2023] Open

Wang Y, Wei Z, Su J, Coenen F, Meng J. RgnTX: Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity. Comput Struct Biotechnol J 2023;21:4110-4117. [PMID: 37671241 PMCID: PMC10475473 DOI: 10.1016/j.csbj.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 08/13/2023] [Accepted: 08/23/2023] [Indexed: 09/07/2023] Open

Monti R, Ohler U. Toward Identification of Functional Sequences and Variants in Noncoding DNA. Annu Rev Biomed Data Sci 2023;6:191-210. [PMID: 37262323 DOI: 10.1146/annurev-biodatasci-122120-110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Horlacher M, Wagner N, Moyon L, Kuret K, Goedert N, Salvatore M, Ule J, Gagneur J, Winther O, Marsico A. Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning. Genome Biol 2023;24:180. [PMID: 37542318 PMCID: PMC10403857 DOI: 10.1186/s13059-023-03015-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 07/17/2023] [Indexed: 08/06/2023] Open

Boyle EA, Her HL, Mueller JR, Naritomi JT, Nguyen GG, Yeo GW. Skipper analysis of eCLIP datasets enables sensitive detection of constrained translation factor binding sites. CELL GENOMICS 2023;3:100317. [PMID: 37388912 PMCID: PMC10300551 DOI: 10.1016/j.xgen.2023.100317] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 02/17/2023] [Accepted: 04/06/2023] [Indexed: 07/01/2023]

Street L, Rothamel K, Brannan K, Jin W, Bokor B, Dong K, Rhine K, Madrigal A, Al-Azzam N, Kim JK, Ma Y, Abdou A, Wolin E, Doron-Mandel E, Ahdout J, Mujumdar M, Jovanovic M, Yeo GW. Large-scale map of RNA binding protein interactomes across the mRNA life-cycle. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.08.544225. [PMID: 37333282 PMCID: PMC10274859 DOI: 10.1101/2023.06.08.544225] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]

Affiliation(s)

Lena Street These authors contributed equally Department of Biological Sciences, Columbia University, New York, NY, USA
Katherine Rothamel These authors contributed equally Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Kristopher Brannan These authors contributed equally Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA Center for RNA Therapeutics, Houston Methodist Research Institute, Houston, TX, USA Department of Cardiovascular Sciences, Houston Methodist Research Institute, Houston, TX, USA
Wenhao Jin Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Benjamin Bokor Department of Biological Sciences, Columbia University, New York, NY, USA
Kevin Dong Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Kevin Rhine Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Assael Madrigal Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Norah Al-Azzam Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Jenny Kim Kim Department of Biological Sciences, Columbia University, New York, NY, USA
Yanzhe Ma Department of Biological Sciences, Columbia University, New York, NY, USA
Ahmed Abdou Department of Biological Sciences, Columbia University, New York, NY, USA
Erica Wolin Department of Biological Sciences, Columbia University, New York, NY, USA
Ella Doron-Mandel Department of Biological Sciences, Columbia University, New York, NY, USA
Joshua Ahdout Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Mayuresh Mujumdar Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Marko Jovanovic Department of Biological Sciences, Columbia University, New York, NY, USA
Gene W Yeo Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA

Collapse

Wang Q, Xu T, Xu K, Lu Z, Ying J. Prediction of transport proteins from sequence information with the deep learning approach. Comput Biol Med 2023;160:106974. [PMID: 37167658 DOI: 10.1016/j.compbiomed.2023.106974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 04/17/2023] [Accepted: 04/22/2023] [Indexed: 05/13/2023]

Wang X, Zhang M, Long C, Yao L, Zhu M. Self-Attention Based Neural Network for Predicting RNA-Protein Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1469-1479. [PMID: 36067103 DOI: 10.1109/tcbb.2022.3204661] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Horlacher M, Oleshko S, Hu Y, Ghanbari M, Cantini G, Schinke P, Vergara EE, Bittner F, Mueller NS, Ohler U, Moyon L, Marsico A. A computational map of the human-SARS-CoV-2 protein-RNA interactome predicted at single-nucleotide resolution. NAR Genom Bioinform 2023;5:lqad010. [PMID: 36814457 PMCID: PMC9940458 DOI: 10.1093/nargab/lqad010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 01/10/2023] [Accepted: 02/14/2023] [Indexed: 02/22/2023] Open

Koo PK, Ploenzke M, Anand P, Paul S, Majdandzic A. ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks. Methods Mol Biol 2023;2586:197-215. [PMID: 36705906 DOI: 10.1007/978-1-0716-2768-6_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Agarwal V, Kelley DR. The genetic and biochemical determinants of mRNA degradation rates in mammals. Genome Biol 2022;23:245. [PMID: 36419176 PMCID: PMC9684954 DOI: 10.1186/s13059-022-02811-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 11/02/2022] [Indexed: 11/26/2022] Open

Abstract

BACKGROUND

Degradation rate is a fundamental aspect of mRNA metabolism, and the factors governing it remain poorly characterized. Understanding the genetic and biochemical determinants of mRNA half-life would enable more precise identification of variants that perturb gene expression through post-transcriptional gene regulatory mechanisms.

RESULTS

We establish a compendium of 39 human and 27 mouse transcriptome-wide mRNA decay rate datasets. A meta-analysis of these data identified a prevalence of technical noise and measurement bias, induced partially by the underlying experimental strategy. Correcting for these biases allowed us to derive more precise, consensus measurements of half-life which exhibit enhanced consistency between species. We trained substantially improved statistical models based upon genetic and biochemical features to better predict half-life and characterize the factors molding it. Our state-of-the-art model, Saluki, is a hybrid convolutional and recurrent deep neural network which relies only upon an mRNA sequence annotated with coding frame and splice sites to predict half-life (r=0.77). The key novel principle learned by Saluki is that the spatial positioning of splice sites, codons, and RNA-binding motifs within an mRNA is strongly associated with mRNA half-life. Saluki predicts the impact of RNA sequences and genetic mutations therein on mRNA stability, in agreement with functional measurements derived from massively parallel reporter assays.

CONCLUSIONS

Our work produces a more robust ground truth for transcriptome-wide mRNA half-lives in mammalian cells. Using these revised measurements, we trained Saluki, a model that is over 50% more accurate in predicting half-life from sequence than existing models. Saluki succinctly captures many of the known determinants of mRNA half-life and can be rapidly deployed to predict the functional consequences of arbitrary mutations in the transcriptome.

Collapse

Huang D, Chen K, Song B, Wei Z, Su J, Coenen F, de Magalhães JP, Rigden DJ, Meng J. Geographic encoding of transcripts enabled high-accuracy and isoform-aware deep learning of RNA methylation. Nucleic Acids Res 2022;50:10290-10310. [PMID: 36155798 PMCID: PMC9561283 DOI: 10.1093/nar/gkac830] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 08/26/2022] [Accepted: 09/15/2022] [Indexed: 12/25/2022] Open

Cortés-López M, Schulz L, Enculescu M, Paret C, Spiekermann B, Quesnel-Vallières M, Torres-Diz M, Unic S, Busch A, Orekhova A, Kuban M, Mesitov M, Mulorz MM, Shraim R, Kielisch F, Faber J, Barash Y, Thomas-Tikhonenko A, Zarnack K, Legewie S, König J. High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance. Nat Commun 2022;13:5570. [PMID: 36138008 PMCID: PMC9500061 DOI: 10.1038/s41467-022-31818-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 07/05/2022] [Indexed: 11/29/2022] Open

Affiliation(s)

Mariela Cortés-López Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Laura Schulz Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Mihaela Enculescu Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Claudia Paret Department of Pediatric Hematology/Oncology, Center for Pediatric and Adolescent Medicine, University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,University Cancer Center (UCT), University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,German Cancer Consortium (DKTK), site Frankfurt/Mainz, Germany, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
Bea Spiekermann Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Mathieu Quesnel-Vallières Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA.,Department of Biochemistry and Biophysics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
Manuel Torres-Diz Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
Sebastian Unic Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany
Anke Busch Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Anna Orekhova Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Monika Kuban Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany
Mikhail Mesitov Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Miriam M Mulorz Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Rawan Shraim Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
Fridolin Kielisch Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
Jörg Faber Department of Pediatric Hematology/Oncology, Center for Pediatric and Adolescent Medicine, University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,University Cancer Center (UCT), University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,German Cancer Consortium (DKTK), site Frankfurt/Mainz, Germany, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
Yoseph Barash Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
Andrei Thomas-Tikhonenko Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
Kathi Zarnack Buchmann Institute for Molecular Life Sciences (BMLS), Max-von-Laue-Str. 15, 60438, Frankfurt, Germany. .,Faculty Biological Sciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438, Frankfurt, Germany.
Stefan Legewie Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany. .,Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany. .,Stuttgart Research Center for Systems Biology (SRCSB), University of Stuttgart, Stuttgart, Germany.
Julian König Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.

Collapse

Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes. Nat Commun 2022;13:5332. [PMID: 36088354 PMCID: PMC9464252 DOI: 10.1038/s41467-022-32864-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 08/22/2022] [Indexed: 12/05/2022] Open

Ma H, Wen H, Xue Z, Li G, Zhang Z. RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites. PLoS Comput Biol 2022;18:e1010293. [PMID: 35819951 PMCID: PMC9275694 DOI: 10.1371/journal.pcbi.1010293] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 06/09/2022] [Indexed: 11/19/2022] Open

Abstract

RNA molecules can adopt stable secondary and tertiary structures, which are essential in mediating physical interactions with other partners such as RNA binding proteins (RBPs) and in carrying out their cellular functions. In vivo and in vitro experiments such as RNAcompete and eCLIP have revealed in vitro binding preferences of RBPs to RNA oligomers and in vivo binding sites in cells. Analysis of these binding data showed that the structure properties of the RNAs in these binding sites are important determinants of the binding events; however, it has been a challenge to incorporate the structure information into an interpretable model. Here we describe a new approach, RNANetMotif, which takes predicted secondary structure of thousands of RNA sequences bound by an RBP as input and uses a graph theory approach to recognize enriched subgraphs. These enriched subgraphs are in essence shared sequence-structure elements that are important in RBP-RNA binding. To validate our approach, we performed RNA structure modeling via coarse-grained molecular dynamics folding simulations for selected 4 RBPs, and RNA-protein docking for LIN28B. The simulation results, e.g., solvent accessibility and energetics, further support the biological relevance of the discovered network subgraphs.

RNA binding proteins (RBPs) regulate every aspect of RNA biology, including splicing, translation, transportation, and degradation. High-throughput technologies such as eCLIP have identified thousands of binding sites for a given RBP throughout the genome. It has been shown by earlier studies that, in addition to nucleotide sequences, the structure and conformation of RNAs also play important role in RBP-RNA interactions. Analogous to protein-protein interactions or protein-DNA interactions, it is likely that there exist intrinsic sequence-structure motifs common to these RNAs that underlie their binding specificity to specific RBPs. It is known that RNAs form energetically favorable secondary structures, which can be represented as graphs, with nucleotides being nodes and backbone covalent bonds and base-pairing hydrogen bonds representing edges. We hypothesize that these graphs can be mined by graph theory approaches to identify sequence-structure motifs as enriched sub-graphs. In this article, we described the details of this approach, termed RNANetMotif and associated new concepts, namely EKS (Extended K-mer Subgraph) and GraphK graph algorithm. To test the utility of our approach, we conducted 3D structure modeling of selected RNA sequences through molecular dynamics (MD) folding simulation and evaluated the significance of the discovered RNA motifs by comparing their spatial exposure with other regions on the RNA. We believe that this approach has the novelty of treating the RNA sequence as a graph and RBP binding sites as enriched subgraph, which has broader applications beyond RBP-RNA interactions.

Collapse

Barshai M, Aubert A, Orenstein Y. G4detector: Convolutional Neural Network to Predict DNA G-Quadruplexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:1946-1955. [PMID: 33872156 DOI: 10.1109/tcbb.2021.3073595] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Du X, Zhao X, Zhang Y. DeepBtoD: Improved RNA-binding proteins prediction via integrated deep learning. J Bioinform Comput Biol 2022;20:2250006. [PMID: 35451938 DOI: 10.1142/s0219720022500068] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Chalupová E, Vaculík O, Poláček J, Jozefov F, Majtner T, Alexiou P. ENNGene: an Easy Neural Network model building tool for Genomics. BMC Genomics 2022;23:248. [PMID: 35361122 PMCID: PMC8973509 DOI: 10.1186/s12864-022-08414-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 02/23/2022] [Indexed: 11/17/2022] Open

Abstract

Background

The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field.

Results

Here we present ENNGene—Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein.

Conclusions

As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-022-08414-x.

Collapse

Liu Y, Li R, Luo J, Zhang Z. Inferring RNA-binding protein target preferences using adversarial domain adaptation. PLoS Comput Biol 2022;18:e1009863. [PMID: 35202389 PMCID: PMC8870515 DOI: 10.1371/journal.pcbi.1009863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 01/25/2022] [Indexed: 11/18/2022] Open

Abstract

Precise identification of target sites of RNA-binding proteins (RBP) is important to understand their biochemical and cellular functions. A large amount of experimental data is generated by in vivo and in vitro approaches. The binding preferences determined from these platforms share similar patterns but there are discernable differences between these datasets. Computational methods trained on one dataset do not always work well on another dataset. To address this problem which resembles the classic "domain shift" in deep learning, we adopted the adversarial domain adaptation (ADDA) technique and developed a framework (RBP-ADDA) that can extract RBP binding preferences from an integration of in vivo and vitro datasets. Compared with conventional methods, ADDA has the advantage of working with two input datasets, as it trains the initial neural network for each dataset individually, projects the two datasets onto a feature space, and uses an adversarial framework to derive an optimal network that achieves an optimal discriminative predictive power. In the first step, for each RBP, we include only the in vitro data to pre-train a source network and a task predictor. Next, for the same RBP, we initiate the target network by using the source network and use adversarial domain adaptation to update the target network using both in vitro and in vivo data. These two steps help leverage the in vitro data to improve the prediction on in vivo data, which is typically challenging with a lower signal-to-noise ratio. Finally, to further take the advantage of the fused source and target data, we fine-tune the task predictor using both data. We showed that RBP-ADDA achieved better performance in modeling in vivo RBP binding data than other existing methods as judged by Pearson correlations. It also improved predictive performance on in vitro datasets. We further applied augmentation operations on RBPs with less in vivo data to expand the input data and showed that it can improve prediction performances. Lastly, we explored the predictive interpretability of RBP-ADDA, where we quantified the contribution of the input features by Integrated Gradients and identified nucleotide positions that are important for RBP recognition.

Collapse

Yaish O, Orenstein Y. Computational modeling of mRNA degradation dynamics using deep neural networks. Bioinformatics 2022;38:1087-1101. [PMID: 34849591 DOI: 10.1093/bioinformatics/btab800] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 11/12/2021] [Accepted: 11/22/2021] [Indexed: 02/04/2023] Open

Abstract

MOTIVATION

messenger RNA (mRNA) degradation plays critical roles in post-transcriptional gene regulation. A major component of mRNA degradation is determined by 3'-UTR elements. Hence, researchers are interested in studying mRNA dynamics as a function of 3'-UTR elements. A recent study measured the mRNA degradation dynamics of tens of thousands of 3'-UTR sequences using a massively parallel reporter assay. However, the computational approach used to model mRNA degradation was based on a simplifying assumption of a linear degradation rate. Consequently, the underlying mechanism of 3'-UTR elements is still not fully understood.

RESULTS

Here, we developed deep neural networks to predict mRNA degradation dynamics and interpreted the networks to identify regulatory elements in the 3'-UTR and their positional effect. Given an input of a 110 nt-long 3'-UTR sequence and an initial mRNA level, the model predicts mRNA levels of eight consecutive time points. Our deep neural networks significantly improved prediction performance of mRNA degradation dynamics compared with extant methods for the task. Moreover, we demonstrated that models predicting the dynamics of two identical 3'-UTR sequences, differing by their poly(A) tail, performed better than single-task models. On the interpretability front, by using Integrated Gradients, our convolutional neural networks (CNNs) models identified known and novel cis-regulatory sequence elements of mRNA degradation. By applying a novel systematic evaluation of model interpretability, we demonstrated that the recurrent neural network models are inferior to the CNN models in terms of interpretability and that random initialization ensemble improves both prediction and interoperability performance. Moreover, using a mutagenesis analysis, we newly discovered the positional effect of various 3'-UTR elements.

AVAILABILITY AND IMPLEMENTATION

All the code developed through this study is available at github.com/OrensteinLab/DeepUTR/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery. iScience 2021;24:103381. [PMID: 34841226 PMCID: PMC8605353 DOI: 10.1016/j.isci.2021.103381] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 09/01/2021] [Accepted: 10/27/2021] [Indexed: 11/29/2022] Open

Zhao S, Hamada M. Multi-resBind: a residual network-based multi-label classifier for in vivo RNA binding prediction and preference visualization. BMC Bioinformatics 2021;22:554. [PMID: 34781902 PMCID: PMC8594109 DOI: 10.1186/s12859-021-04430-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 10/06/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture.

RESULTS

Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions.

CONCLUSIONS

Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.

Collapse

Tayara H, Chong KT. Improved Predicting of The Sequence Specificities of RNA Binding Proteins by Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2526-2534. [PMID: 32191896 DOI: 10.1109/tcbb.2020.2981335] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Pezoulas VC, Hazapis O, Lagopati N, Exarchos TP, Goules AV, Tzioufas AG, Fotiadis DI, Stratis IG, Yannacopoulos AN, Gorgoulis VG. Machine Learning Approaches on High Throughput NGS Data to Unveil Mechanisms of Function in Biology and Disease. Cancer Genomics Proteomics 2021;18:605-626. [PMID: 34479914 PMCID: PMC8441762 DOI: 10.21873/cgp.20284] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 07/21/2021] [Accepted: 08/03/2021] [Indexed: 12/13/2022] Open

Affiliation(s)

Vasileios C Pezoulas Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
Orsalia Hazapis Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
Nefeli Lagopati Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece Biomedical Research Foundation of the Academy of Athens, Athens, Greece
Themis P Exarchos Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece Department of Informatics, Ionian University, Corfu, Greece
Andreas V Goules Department of Pathophysiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
Athanasios G Tzioufas Department of Pathophysiology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
Dimitrios I Fotiadis Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, Ioannina, Greece
Ioannis G Stratis Department of Mathematics, National and Kapodistrian University of Athens, Athens, Greece
Athanasios N Yannacopoulos Department of Statistics, and Stochastic Modelling and Applications Laboratory, Athens University of Economics and Business (AUEB), Athens, Greece;
Vassilis G Gorgoulis Molecular Carcinogenesis Group, Department of Histology and Embryology, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece; Biomedical Research Foundation of the Academy of Athens, Athens, Greece Division of Cancer Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, Manchester Cancer Research Centre, NIHR Manchester Biomedical Research Centre, University of Manchester, Manchester, U.K Center for New Biotechnologies and Precision Medicine, Medical School, National and Kapodistrian University of Athens, Athens, Greece Faculty of Health and Medical Sciences, University of Surrey, Surrey, U.K

Collapse

RNA-Binding Motif Protein 11 (RBM11) Serves as a Prognostic Biomarker and Promotes Ovarian Cancer Progression. DISEASE MARKERS 2021;2021:3037337. [PMID: 34434291 PMCID: PMC8382552 DOI: 10.1155/2021/3037337] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 01/14/2023]

Wu H, Pan X, Yang Y, Shen HB. Recognizing binding sites of poorly characterized RNA-binding proteins on circular RNAs using attention Siamese network. Brief Bioinform 2021;22:6326526. [PMID: 34297803 DOI: 10.1093/bib/bbab279] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/04/2021] [Accepted: 07/01/2021] [Indexed: 12/24/2022] Open

Sohrabi-Jahromi S, Söding J. Thermodynamic modeling reveals widespread multivalent binding by RNA-binding proteins. Bioinformatics 2021;37:i308-i316. [PMID: 34252974 PMCID: PMC8275352 DOI: 10.1093/bioinformatics/btab300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Boeckel JN, Möbius-Winkler M, Müller M, Rebs S, Eger N, Schoppe L, Tappu R, Kokot KE, Kneuer JM, Gaul S, Bordalo DM, Lai A, Haas J, Ghanbari M, Drewe-Boss P, Liss M, Katus HA, Ohler U, Gotthardt M, Laufs U, Streckfuss-Bömeke K, Meder B. SLM2 Is A Novel Cardiac Splicing Factor Involved in Heart Failure due to Dilated Cardiomyopathy. GENOMICS PROTEOMICS & BIOINFORMATICS 2021;20:129-146. [PMID: 34273561 PMCID: PMC9510876 DOI: 10.1016/j.gpb.2021.01.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 02/01/2021] [Indexed: 01/09/2023]

Affiliation(s)

Jes-Niels Boeckel Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; Klinik und Poliklinik für Kardiologie, Universitätskrankenhaus Leipzig, Leipzig 04103, Germany
Maximilian Möbius-Winkler Klinik und Poliklinik für Kardiologie, Universitätskrankenhaus Leipzig, Leipzig 04103, Germany
Marion Müller Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; German Center for Cardiovascular Research (DZHK), Partner site Heidelberg, Heidelberg 69120, Germany; Clinic for General and Interventional Cardiology/ Angiology, Herz- und Diabeteszentrum NRW, Ruhr-Universität Bochum, Bad Oeynhausen 32545, Germany
Sabine Rebs Department of Cardiology and Pneumology, University Hospital, Georg-August University Goettingen, Goettingen 37075, Germany; German Center for Cardiovascular Research (DZHK), Partner site Goettingen, Goettingen 37075, Germany
Nicole Eger Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany
Laura Schoppe Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany
Rewati Tappu Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany
Karoline E Kokot Klinik und Poliklinik für Kardiologie, Universitätskrankenhaus Leipzig, Leipzig 04103, Germany
Jasmin M Kneuer Klinik und Poliklinik für Kardiologie, Universitätskrankenhaus Leipzig, Leipzig 04103, Germany
Susanne Gaul Klinik und Poliklinik für Kardiologie, Universitätskrankenhaus Leipzig, Leipzig 04103, Germany
Diana M Bordalo Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; German Center for Cardiovascular Research (DZHK), Partner site Heidelberg, Heidelberg 69120, Germany
Alan Lai Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; German Center for Cardiovascular Research (DZHK), Partner site Heidelberg, Heidelberg 69120, Germany
Jan Haas Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; German Center for Cardiovascular Research (DZHK), Partner site Heidelberg, Heidelberg 69120, Germany
Mahsa Ghanbari Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany; Institute of Biology, Humboldt Universität zu Berlin, Berlin 10099, Germany
Philipp Drewe-Boss Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany; Institute of Biology, Humboldt Universität zu Berlin, Berlin 10099, Germany
Martin Liss Neuromuscular and Cardiovascular Cell Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 13092, Germany; German Center for Cardiovascular Research (DZHK), Partner site Berlin, Berlin 10117, Germany
Hugo A Katus Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; German Center for Cardiovascular Research (DZHK), Partner site Heidelberg, Heidelberg 69120, Germany
Uwe Ohler Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 10115, Germany; Institute of Biology, Humboldt Universität zu Berlin, Berlin 10099, Germany
Michael Gotthardt Neuromuscular and Cardiovascular Cell Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin 13092, Germany; German Center for Cardiovascular Research (DZHK), Partner site Berlin, Berlin 10117, Germany
Ulrich Laufs Klinik und Poliklinik für Kardiologie, Universitätskrankenhaus Leipzig, Leipzig 04103, Germany
Katrin Streckfuss-Bömeke Department of Cardiology and Pneumology, University Hospital, Georg-August University Goettingen, Goettingen 37075, Germany; German Center for Cardiovascular Research (DZHK), Partner site Goettingen, Goettingen 37075, Germany
Benjamin Meder Department of Cardiology, Angiology and Pneumology, University Hospital Heidelberg, Heidelberg 69120, Germany; German Center for Cardiovascular Research (DZHK), Partner site Heidelberg, Heidelberg 69120, Germany; Stanford Genome Technology Center, Department of Genetics, Stanford Medical School, Palo Alto, CA 94304, USA.

Collapse

Song Z, Huang D, Song B, Chen K, Song Y, Liu G, Su J, Magalhães JPD, Rigden DJ, Meng J. Attention-based multi-label neural networks for integrated prediction and interpretation of twelve widely occurring RNA modifications. Nat Commun 2021;12:4011. [PMID: 34188054 PMCID: PMC8242015 DOI: 10.1038/s41467-021-24313-3] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 06/07/2021] [Indexed: 02/08/2023] Open

Guo X, Ohler U, Yildirim F. How to find genomic regions relevant for gene regulation. MED GENET-BERLIN 2021;33:157-165. [PMID: 38836026 PMCID: PMC11007629 DOI: 10.1515/medgen-2021-2074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 07/09/2021] [Indexed: 06/06/2024]

Back G, Walther D. Identification of cis-regulatory motifs in first introns and the prediction of intron-mediated enhancement of gene expression in Arabidopsis thaliana. BMC Genomics 2021;22:390. [PMID: 34039279 PMCID: PMC8157754 DOI: 10.1186/s12864-021-07711-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 05/11/2021] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

Intron mediated enhancement (IME) is the potential of introns to enhance the expression of its respective gene. This essential function of introns has been observed in a wide range of species, including fungi, plants, and animals. However, the mechanisms underlying the enhancement are as of yet poorly understood. The goal of this study was to identify potential IME-related sequence motifs and genomic features in first introns of genes in Arabidopsis thaliana.

RESULTS

Based on the rationale that functional sequence motifs are evolutionarily conserved, we exploited the deep sequencing information available for Arabidopsis thaliana, covering more than one thousand Arabidopsis accessions, and identified 81 candidate hexamer motifs with increased conservation across all accessions that also exhibit positional occurrence preferences. Of those, 71 were found associated with increased correlation of gene expression of genes harboring them, suggesting a cis-regulatory role. Filtering further for effect on gene expression correlation yielded a set of 16 hexamer motifs, corresponding to five consensus motifs. While all five motifs represent new motif definitions, two are similar to the two previously reported IME-motifs, whereas three are altogether novel. Both consensus and hexamer motifs were found associated with higher expression of alleles harboring them as compared to alleles containing mutated motif variants as found in naturally occurring Arabidopsis accessions. To identify additional IME-related genomic features, Random Forest models were trained for the classification of gene expression level based on an array of sequence-related features. The results indicate that introns contain information with regard to gene expression level and suggest sequence-compositional features as most informative, while position-related features, thought to be of central importance before, were found with lower than expected relevance.

CONCLUSIONS

Exploiting deep sequencing and broad gene expression information and on a genome-wide scale, this study confirmed the regulatory role on first-introns, characterized their intra-species conservation, and identified a set of novel sequence motifs located in first introns of genes in the genome of the plant Arabidopsis thaliana that may play a role in inducing high and correlated gene expression of the genes harboring them.

Collapse

DeepTFactor: A deep learning-based tool for the prediction of transcription factors. Proc Natl Acad Sci U S A 2021;118:2021171118. [PMID: 33372147 DOI: 10.1073/pnas.2021171118] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Sun L, Xu K, Huang W, Yang YT, Li P, Tang L, Xiong T, Zhang QC. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res 2021;31:495-516. [PMID: 33623109 PMCID: PMC7900654 DOI: 10.1038/s41422-021-00476-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 01/19/2021] [Indexed: 01/31/2023] Open

Affiliation(s)

Lei Sun MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Kui Xu MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Wenze Huang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Yucheng T Yang Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Pan Li MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Lei Tang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Tuanlin Xiong MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
Qiangfeng Cliff Zhang MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China. Tsinghua-Peking Center for Life Sciences, Beijing 100084, China.

Collapse

Koo PK, Majdandzic A, Ploenzke M, Anand P, Paul SB. Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput Biol 2021;17:e1008925. [PMID: 33983921 PMCID: PMC8118286 DOI: 10.1371/journal.pcbi.1008925] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 03/30/2021] [Indexed: 12/15/2022] Open

Yan Z, Hamilton WL, Blanchette M. Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions. Bioinformatics 2021;36:i276-i284. [PMID: 32657407 PMCID: PMC7355240 DOI: 10.1093/bioinformatics/btaa456] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Hafner M, Katsantoni M, Köster T, Marks J, Mukherjee J, Staiger D, Ule J, Zavolan M. CLIP and complementary methods. ACTA ACUST UNITED AC 2021. [DOI: 10.1038/s43586-021-00018-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Yang S, Liu X, Ng RT. ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins. Bioinformatics 2021;36:4797-4804. [PMID: 32573679 PMCID: PMC7750938 DOI: 10.1093/bioinformatics/btaa580] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 05/18/2020] [Accepted: 06/18/2020] [Indexed: 12/15/2022] Open

Abstract

MOTIVATION

The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein-nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown.

RESULTS

Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation.

AVAILABILITY AND IMPLEMENTATION

The source code is freely available at <https://github.com/syang11/ProbeRating>.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Miko H, Qiu Y, Gaertner B, Sander M, Ohler U. Inferring time series chromatin states for promoter-enhancer pairs based on Hi-C data. BMC Genomics 2021;22:84. [PMID: 33509077 PMCID: PMC7841892 DOI: 10.1186/s12864-021-07373-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 01/07/2021] [Indexed: 11/12/2022] Open

Abstract

BACKGROUND

Co-localized combinations of histone modifications ("chromatin states") have been shown to correlate with promoter and enhancer activity. Changes in chromatin states over multiple time points ("chromatin state trajectories") have previously been analyzed at promoter and enhancers separately. With the advent of time series Hi-C data it is now possible to connect promoters and enhancers and to analyze chromatin state trajectories at promoter-enhancer pairs.

RESULTS

We present TimelessFlex, a framework for investigating chromatin state trajectories at promoters and enhancers and at promoter-enhancer pairs based on Hi-C information. TimelessFlex extends our previous approach Timeless, a Bayesian network for clustering multiple histone modification data sets at promoter and enhancer feature regions. We utilize time series ATAC-seq data measuring open chromatin to define promoters and enhancer candidates. We developed an expectation-maximization algorithm to assign promoters and enhancers to each other based on Hi-C interactions and jointly cluster their feature regions into paired chromatin state trajectories. We find jointly clustered promoter-enhancer pairs showing the same activation patterns on both sides but with a stronger trend at the enhancer side. While the promoter side remains accessible across the time series, the enhancer side becomes dynamically more open towards the gene activation time point. Promoter cluster patterns show strong correlations with gene expression signals, whereas Hi-C signals get only slightly stronger towards activation. The code of the framework is available at https://github.com/henriettemiko/TimelessFlex .

CONCLUSIONS

TimelessFlex clusters time series histone modifications at promoter-enhancer pairs based on Hi-C and it can identify distinct chromatin states at promoter and enhancer feature regions and their changes over time.

Collapse

Asif M, Orenstein Y. DeepSELEX: inferring DNA-binding preferences from HT-SELEX data using multi-class CNNs. Bioinformatics 2020;36:i634-i642. [PMID: 33381817 DOI: 10.1093/bioinformatics/btaa789] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Long-read RNA sequencing of human and animal filarial parasites improves gene models and discovers operons. PLoS Negl Trop Dis 2020;14:e0008869. [PMID: 33196647 PMCID: PMC7704054 DOI: 10.1371/journal.pntd.0008869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 11/30/2020] [Accepted: 10/09/2020] [Indexed: 01/01/2023] Open

Abstract

Filarial parasitic nematodes (Filarioidea) cause substantial disease burden to humans and animals around the world. Recently there has been a coordinated global effort to generate, annotate, and curate genomic data from nematode species of medical and veterinary importance. This has resulted in two chromosome-level assemblies (Brugia malayi and Onchocerca volvulus) and 11 additional draft genomes from Filarioidea. These reference assemblies facilitate comparative genomics to explore basic helminth biology and prioritize new drug and vaccine targets. While the continual improvement of genome contiguity and completeness advances these goals, experimental functional annotation of genes is often hindered by poor gene models. Short-read RNA sequencing data and expressed sequence tags, in cooperation with ab initio prediction algorithms, are employed for gene prediction, but these can result in missing clade-specific genes, fragmented models, imperfect mapping of gene ends, and lack of isoform resolution. Long-read RNA sequencing can overcome these drawbacks and greatly improve gene model quality. Here, we present Iso-Seq data for B. malayi and Dirofilaria immitis, etiological agents of lymphatic filariasis and canine heartworm disease, respectively. These data cover approximately half of the known coding genomes and substantially improve gene models by extending untranslated regions, cataloging novel splice junctions from novel isoforms, and correcting mispredicted junctions. Furthermore, we validated computationally predicted operons, manually curated new operons, and merged fragmented gene models. We carried out analyses of poly(A) tails in both species, leading to the identification of non-canonical poly(A) signals. Finally, we prioritized and assessed known and putative anthelmintic targets, correcting or validating gene models for molecular cloning and target-based anthelmintic screening efforts. Overall, these data significantly improve the catalog of gene models for two important parasites, and they demonstrate how long-read RNA sequencing should be prioritized for ongoing improvement of parasitic nematode genome assemblies.

Filarial parasitic nematodes are vector-borne parasites that infect humans and animals. Brugia malayi and Dirofilaria immitis are transmitted by mosquitoes and cause human lymphatic filariasis and canine heartworm disease, respectively. Recent years have seen a dramatic increase in genomic and transcriptomic data sets and the concomitant increase in innovative strategies for drug target identification, validation, and screening. However, while the completeness of genome assemblies of filarial parasitic nematodes has seen steady improvements, the reliability of gene models has not kept pace, hindering cloning efforts. Long-read RNA sequencing technologies are uniquely able to improve gene models, but have not been widely used for the causative agents of neglected tropical diseases. Here, we report the improvement of gene models in both B. malayi and D. immitis by long-read RNA sequencing. We identified novel operons, deprecated false positive operons, identified dozens of novel genes, and described the parameters of polyadenylation. We also focused on putative anthelmintic targets, identifying novel isoforms and correcting gene models. These data substantially increase the trustworthiness of gene models in these two species and demonstrate how long-read sequencing approaches should be prioritized in the continued improvement of genome assemblies and their gene annotations.

Collapse

Song J, Tian S, Yu L, Xing Y, Yang Q, Duan X, Dai Q. AC-Caps: Attention Based Capsule Network for Predicting RBP Binding Sites of LncRNA. Interdiscip Sci 2020;12:414-423. [PMID: 32572768 DOI: 10.1007/s12539-020-00379-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 05/18/2020] [Accepted: 05/30/2020] [Indexed: 01/03/2023]