1
|
Harris SE, Alexis MS, Giri G, Cavazos FF, Hu Y, Murn J, Aleman MM, Burge CB, Dominguez D. Understanding species-specific and conserved RNA-protein interactions in vivo and in vitro. Nat Commun 2024; 15:8400. [PMID: 39333159 PMCID: PMC11436793 DOI: 10.1038/s41467-024-52231-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 08/28/2024] [Indexed: 09/29/2024] Open
Abstract
While evolution is often considered from a DNA- and protein-centric view, RNA-based regulation can also impact gene expression and protein sequences. Here we examine interspecies differences in RNA-protein interactions using the conserved neuronal RNA-binding protein, Unkempt (UNK) as model. We find that roughly half of mRNAs bound in human are also bound in mouse. Unexpectedly, even when transcript-level binding was conserved across species differential motif usage was prevalent. To understand the biochemical basis of UNK-RNA interactions, we reconstitute the human and mouse UNK-RNA interactomes using a high-throughput biochemical assay. We uncover detailed features driving binding, show that in vivo patterns are captured in vitro, find that highly conserved sites are the strongest bound, and associate binding strength with downstream regulation. Furthermore, subtle sequence differences surrounding motifs are key determinants of species-specific binding. We highlight the complex features driving protein-RNA interactions and how these evolve to confer species-specific regulation.
Collapse
Affiliation(s)
- Sarah E Harris
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, USA
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC, USA
| | - Maria S Alexis
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Remix Therapeutics, Cambridge, MA, USA
| | - Gilbert Giri
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA
| | - Francisco F Cavazos
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC, USA
| | - Yue Hu
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC, USA
| | - Jernej Murn
- Department of Biochemistry, University of California, Riverside, CA, USA
- Center for RNA Biology and Medicine, Riverside, CA, USA
| | - Maria M Aleman
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Daniel Dominguez
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC, USA.
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC, USA.
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC, USA.
- RNA Discovery Center, University of North Carolina, Chapel Hill, NC, USA.
| |
Collapse
|
2
|
Zhao J, Gui Y, Wu W, Li X, Wang L, Wang H, Luo Y, Zhou G, Yuan C. The function of long non-coding RNA IFNG-AS1 in autoimmune diseases. Hum Cell 2024; 37:1325-1335. [PMID: 39004663 DOI: 10.1007/s13577-024-01103-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024]
Abstract
The prevalence of autoimmune diseases ranks as the third most common disease category globally, following cancer and heart disease. Numerous studies indicate that long non-coding RNA (lncRNA) plays a pivotal role in regulating human growth, development, and the pathogenesis of various diseases. It is more than 200 nucleotides in length and is mostly involve in the regulation of gene expression. Furthermore, lncRNAs are crucial in the development and activation of immune cells, with an expanding body of research exploring their association with autoimmune disorders in humans. LncRNA Ifng antisense RNA 1 (IFNG-AS1), a key regulatory factor in the immune system, also named NeST or TMEVPG1, is proximally located to IFNG and participates in the regulation of it. The dysregulation of IFNG-AS1 is implicated in the pathogenesis of several autoimmune diseases. This study examines the role and mechanism of IFNG-AS1 in various autoimmune diseases and considers its potential as a therapeutic target.
Collapse
Affiliation(s)
- Jiale Zhao
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, 443002, China
- College of Medicine and Health Science, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
| | - Yibei Gui
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
- College of Basic Medical Science, China Three Gorges University, Yichang, 443002, China
| | - Wei Wu
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, 443002, China
- College of Medicine and Health Science, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
| | - Xueqing Li
- College of Medicine and Health Science, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
| | - Lijun Wang
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
- College of Basic Medical Science, China Three Gorges University, Yichang, 443002, China
| | - Hailin Wang
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, 443002, China
- College of Medicine and Health Science, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
| | - Yiyang Luo
- College of Medicine and Health Science, China Three Gorges University, Yichang, 443002, China
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China
| | - Gang Zhou
- College of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China.
- Yichang Hospital of Traditional Chinese Medicine, Yichang, 443002, China.
| | - Chengfu Yuan
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, 443002, China.
- Third-Grade Pharmacological Laboratory on Traditional Chinese Medicine, State Administration of Traditional Chinese Medicine, China Three Gorges University, Yichang, 443002, China.
- College of Basic Medical Science, China Three Gorges University, Yichang, 443002, China.
| |
Collapse
|
3
|
Xu Q, Bao X, Lin Z, Tang L, He LN, Ren J, Zuo Z, Hu K. AStruct: detection of allele-specific RNA secondary structure in structuromic probing data. BMC Bioinformatics 2024; 25:91. [PMID: 38429654 PMCID: PMC11264973 DOI: 10.1186/s12859-024-05704-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 02/14/2024] [Indexed: 03/03/2024] Open
Abstract
BACKGROUND Uncovering functional genetic variants from an allele-specific perspective is of paramount importance in advancing our understanding of gene regulation and genetic diseases. Recently, various allele-specific events, such as allele-specific gene expression, allele-specific methylation, and allele-specific binding, have been explored on a genome-wide scale due to the development of high-throughput sequencing methods. RNA secondary structure, which plays a crucial role in multiple RNA-associated processes like RNA modification, translation and splicing, has emerged as an essential focus of relevant research. However, tools to identify genetic variants associated with allele-specific RNA secondary structures are still lacking. RESULTS Here, we develop a computational tool called 'AStruct' that enables us to detect allele-specific RNA secondary structure (ASRS) from RT-stop based structuromic probing data. AStruct shows robust performance in both simulated datasets and public icSHAPE datasets. We reveal that single nucleotide polymorphisms (SNPs) with higher AStruct scores are enriched in coding regions and tend to be functional. These SNPs are highly conservative, have the potential to disrupt sites involved in m6A modification or protein binding, and are frequently associated with disease. CONCLUSIONS AStruct is a tool dedicated to invoke allele-specific RNA secondary structure events at heterozygous SNPs in RT-stop based structuromic probing data. It utilizes allelic variants, base pairing and RT-stop information under different cell conditions to detect dynamic and functional ASRS. Compared to sequence-based tools, AStruct considers dynamic cell conditions and outperforms in detecting functional variants. AStruct is implemented in JAVA and is freely accessible at: https://github.com/canceromics/AStruct .
Collapse
Affiliation(s)
- Qingru Xu
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Xiaoqiong Bao
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Zhuobin Lin
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Lin Tang
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Li-Na He
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Jian Ren
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China
| | - Zhixiang Zuo
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510060, China.
| | - Kunhua Hu
- Guangdong Key Laboratory of Liver Disease Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
4
|
Romo L, Findlay SD, Burge CB. Regulatory features aid interpretation of 3'UTR variants. Am J Hum Genet 2024; 111:350-363. [PMID: 38237594 PMCID: PMC10870128 DOI: 10.1016/j.ajhg.2023.12.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 12/13/2023] [Accepted: 12/14/2023] [Indexed: 01/30/2024] Open
Abstract
Our ability to determine the clinical impact of variants in 3' untranslated regions (UTRs) of genes remains poor. We provide a thorough analysis of 3' UTR variants from several datasets. Variants in putative regulatory elements, including RNA-binding protein motifs, eCLIP peaks, and microRNA sites, are up to 16 times more likely than variants not in these elements to have gene expression and phenotype associations. Variants in regulatory motifs result in allele-specific protein binding in cell lines and allele-specific gene expression differences in population studies. In addition, variants in shared regions of alternatively polyadenylated isoforms and those proximal to polyA sites are more likely to affect gene expression and phenotype. Finally, pathogenic 3' UTR variants in ClinVar are up to 20 times more likely than benign variants to fall in a regulatory site. We incorporated these findings into RegVar, a software tool that interprets regulatory elements and annotations for any 3' UTR variant and predicts whether the variant is likely to affect gene expression or phenotype. This tool will help prioritize variants for experimental studies and identify pathogenic variants in individuals.
Collapse
Affiliation(s)
- Lindsay Romo
- Harvard Medical Genetics Training Program, Boston Children's Hospital, Boston, MA 02115, USA.
| | - Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
| |
Collapse
|
5
|
Harris SE, Alexis MS, Giri G, Cavazos FF, Murn J, Aleman MM, Burge CB, Dominguez D. Understanding species-specific and conserved RNA-protein interactions in vivo and in vitro. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577729. [PMID: 38352439 PMCID: PMC10862761 DOI: 10.1101/2024.01.29.577729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
While evolution is often considered from a DNA- and protein-centric view, RNA-based regulation can also impact gene expression and protein sequences. Here we examined interspecies differences in RNA-protein interactions using the conserved neuronal RNA binding protein, Unkempt (UNK) as model. We find that roughly half of mRNAs bound in human are also bound in mouse. Unexpectedly, even when transcript-level binding was conserved across species differential motif usage was prevalent. To understand the biochemical basis of UNK-RNA interactions, we reconstituted the human and mouse UNK-RNA interactomes using a high-throughput biochemical assay. We uncover detailed features driving binding, show that in vivo patterns are captured in vitro, find that highly conserved sites are the strongest bound, and associate binding strength with downstream regulation. Furthermore, subtle sequence differences surrounding motifs are key determinants of species-specific binding. We highlight the complex features driving protein-RNA interactions and how these evolve to confer species-specific regulation.
Collapse
Affiliation(s)
- Sarah E. Harris
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC
| | - Maria S. Alexis
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA
- Current address: Remix Therapeutics, Cambridge, MA
| | - Gilbert Giri
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC
| | | | - Jernej Murn
- Department of Biochemistry, University of California, Riverside, CA
- Center for RNA Biology and Medicine, Riverside, CA
| | - Maria M. Aleman
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC
| | | | - Daniel Dominguez
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC
- Department of Pharmacology, University of North Carolina, Chapel Hill, NC
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC
- RNA Discovery Center, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
6
|
Findlay SD, Romo L, Burge CB. Quantifying negative selection in human 3' UTRs uncovers constrained targets of RNA-binding proteins. Nat Commun 2024; 15:85. [PMID: 38168060 PMCID: PMC10762232 DOI: 10.1038/s41467-023-44456-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 12/14/2023] [Indexed: 01/05/2024] Open
Abstract
Many non-coding variants associated with phenotypes occur in 3' untranslated regions (3' UTRs), and may affect interactions with RNA-binding proteins (RBPs) to regulate gene expression post-transcriptionally. However, identifying functional 3' UTR variants has proven difficult. We use allele frequencies from the Genome Aggregation Database (gnomAD) to identify classes of 3' UTR variants under strong negative selection in humans. We develop intergenic mutability-adjusted proportion singleton (iMAPS), a generalized measure related to MAPS, to quantify negative selection in non-coding regions. This approach, in conjunction with in vitro and in vivo binding data, identifies precise RBP binding sites, miRNA target sites, and polyadenylation signals (PASs) under strong selection. For each class of sites, we identify thousands of gnomAD variants under selection comparable to missense coding variants, and find that sites in core 3' UTR regions upstream of the most-used PAS are under strongest selection. Together, this work improves our understanding of selection on human genes and validates approaches for interpreting genetic variants in human 3' UTRs.
Collapse
Affiliation(s)
- Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Lindsay Romo
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
- Boston Children's Hospital, Boston, MA, 02115, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA.
| |
Collapse
|
7
|
Li Y, Zhang XO, Liu Y, Lu A. Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs. BMC Bioinformatics 2023; 24:464. [PMID: 38066439 PMCID: PMC10709849 DOI: 10.1186/s12859-023-05604-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 12/05/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Allele-specific binding (ASB) events occur when transcription factors (TFs) bind more favorably to one of the two parental alleles at heterozygous single nucleotide polymorphisms (SNPs). Evidence suggests that ASB events could reveal the impact of sequence variations on TF binding and may have implications for the risk of diseases. RESULTS Here we present ASB-analyzer, a software platform that enables the users to quickly and efficiently input raw sequencing data to generate individual reports containing the cytogenetic map of ASB SNPs and their associated phenotypes. This interactive tool thereby combines ASB SNP identification, biological annotation, motif analysis, phenotype associations and report summary in one pipeline. With this pipeline, we identified 3772 ASB SNPs from thirty GM12878 ChIP-seq datasets and demonstrated that the ASB SNPs were more likely to be enriched at important sites in TF-binding domains. CONCLUSIONS ASB-analyzer is a user-friendly tool that enables the detection, characterization and visualization of ASB SNPs. It is implemented in Python, R and bash shell and packaged in the Conda environment. It is available as an open-source tool on GitHub at https://github.com/Liying1996/ASBanalyzer .
Collapse
Affiliation(s)
- Ying Li
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China
| | - Xiao-Ou Zhang
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China
| | - Yan Liu
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China
| | - Aiping Lu
- Research Center for Translational Medicine at East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200010, China.
| |
Collapse
|
8
|
Zhu H, Yang Y, Wang Y, Wang F, Huang Y, Chang Y, Wong KC, Li X. Dynamic characterization and interpretation for protein-RNA interactions across diverse cellular conditions using HDRNet. Nat Commun 2023; 14:6824. [PMID: 37884495 PMCID: PMC10603054 DOI: 10.1038/s41467-023-42547-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023] Open
Abstract
RNA-binding proteins play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function. However, current computational methods pose challenges to the cross-prediction of RNA-protein binding events across diverse cell lines and tissue contexts. Here, we develop HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events under diverse cellular conditions. Our results demonstrate that HDRNet can accurately and efficiently identify binding sites, particularly for dynamic prediction, outperforming other state-of-the-art models on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with additional tissue data. Moreover, we conduct motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explores the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.
Collapse
Affiliation(s)
- Haoran Zhu
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Yunhe Wang
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Yujian Huang
- College of Computer Science and Cyber Security, Chengdu University of Technology, 610059, Chengdu, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, 130012, Changchun, China.
| |
Collapse
|
9
|
Romo L, Findlay SD, Burge CB. Regulatory features aid interpretation of 3'UTR Variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.01.551549. [PMID: 37577470 PMCID: PMC10418266 DOI: 10.1101/2023.08.01.551549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Our ability to determine the clinical impact of variants in 3' untranslated regions (UTRs) of genes remains poor. We provide a thorough analysis of 3'UTR variants from several datasets. Variants in putative regulatory elements including RNA-binding protein motifs, eCLIP peaks, and microRNA sites are up to 16 times more likely than other variants to have gene expression and phenotype associations. Heterozygous variants in regulatory motifs result in allele-specific protein binding in cell lines and allele-specific gene expression differences in population studies. In addition, variants in shared regions of alternatively polyadenylated isoforms and those proximal to polyA sites are more likely to affect gene expression and phenotype. Finally, pathogenic 3'UTR variants in ClinVar are 20 times more likely than benign variants to fall in a regulatory site. We incorporated these findings into RegVar, a software tool that interprets regulatory elements and annotations for any 3'UTR variant, and predicts whether the variant is likely to affect gene expression or phenotype. This tool will help prioritize variants for experimental studies and identify pathogenic variants in patients.
Collapse
Affiliation(s)
- Lindsay Romo
- Harvard Medical Genetics Training Program, Boston Children’s Hospital, Boston, MA 02115
| | - Scott D. Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142
| | - Christopher B. Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142
| |
Collapse
|
10
|
Cao S, Zhu H, Cui J, Liu S, Li Y, Shi J, Mo J, Wang Z, Wang H, Hu J, Chen L, Li Y, Xia L, Xiao S. Allele-specific RNA N 6-methyladenosine modifications reveal functional genetic variants in human tissues. Genome Res 2023; 33:1369-1380. [PMID: 37714712 PMCID: PMC10547253 DOI: 10.1101/gr.277704.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 06/13/2023] [Indexed: 09/17/2023]
Abstract
An intricate network of cis- and trans-elements acts on RNA N 6-methyladenosine (m6A), which in turn may affect gene expression and, ultimately, human health. A complete understanding of this network requires new approaches to accurately measure the subtle m6A differences arising from genetic variants, many of which have been associated with common diseases. To address this gap, we developed a method to accurately and sensitively detect transcriptome-wide allele-specific m6A (ASm6A) from MeRIP-seq data and applied it to uncover 12,056 high-confidence ASm6A modifications from 25 human tissues. We also identified 1184 putative functional variants for ASm6A regulation, a subset of which we experimentally validated. Importantly, we found that many of these ASm6A-associated genetic variants were enriched for common disease-associated and complex trait-associated risk loci, and verified that two disease risk variants can change m6A modification status. Together, this work provides a tool to detangle the dynamic network of RNA modifications at the allelic level and highlights the interplay of m6A and genetics in human health and disease.
Collapse
Affiliation(s)
- Shuo Cao
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Haoran Zhu
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jinru Cui
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Sun Liu
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yuhe Li
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Junfang Shi
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Junyuan Mo
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zihan Wang
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Hailan Wang
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jiaxin Hu
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Lizhi Chen
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yuan Li
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Laixin Xia
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China;
| | - Shan Xiao
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China;
- Guangdong Provincial Key Laboratory of Cardiac Function and Microcirculation, Guangzhou 510515, China
| |
Collapse
|
11
|
Boyle EA, Her HL, Mueller JR, Naritomi JT, Nguyen GG, Yeo GW. Skipper analysis of eCLIP datasets enables sensitive detection of constrained translation factor binding sites. CELL GENOMICS 2023; 3:100317. [PMID: 37388912 PMCID: PMC10300551 DOI: 10.1016/j.xgen.2023.100317] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 02/17/2023] [Accepted: 04/06/2023] [Indexed: 07/01/2023]
Abstract
Technology for crosslinking and immunoprecipitation (CLIP) followed by sequencing (CLIP-seq) has identified the transcriptomic targets of hundreds of RNA-binding proteins in cells. To increase the power of existing and future CLIP-seq datasets, we introduce Skipper, an end-to-end workflow that converts unprocessed reads into annotated binding sites using an improved statistical framework. Compared with existing methods, Skipper on average calls 210%-320% more transcriptomic binding sites and sometimes >1,000% more sites, providing deeper insight into post-transcriptional gene regulation. Skipper also calls binding to annotated repetitive elements and identifies bound elements for 99% of enhanced CLIP experiments. We perform nine translation factor enhanced CLIPs and apply Skipper to learn determinants of translation factor occupancy, including transcript region, sequence, and subcellular localization. Furthermore, we observe depletion of genetic variation in occupied sites and nominate transcripts subject to selective constraint because of translation factor occupancy. Skipper offers fast, easy, customizable, and state-of-the-art analysis of CLIP-seq data.
Collapse
Affiliation(s)
- Evan A. Boyle
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, UCSD Stem Cell Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Hsuan-Lin Her
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, UCSD Stem Cell Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Jasmine R. Mueller
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, UCSD Stem Cell Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Jack T. Naritomi
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, UCSD Stem Cell Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Grady G. Nguyen
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, UCSD Stem Cell Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Gene W. Yeo
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, UCSD Stem Cell Program, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
12
|
Zheng R, Dunlap M, Lyu J, Gonzalez-Figueroa C, Bobkov G, Harvey SE, Chan TW, Quinones-Valdez G, Choudhury M, Vuong A, Flynn RA, Chang HY, Xiao X, Cheng C. LINE-associated cryptic splicing induces dsRNA-mediated interferon response and tumor immunity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.23.529804. [PMID: 36865202 PMCID: PMC9980139 DOI: 10.1101/2023.02.23.529804] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
RNA splicing plays a critical role in post-transcriptional gene regulation. Exponential expansion of intron length poses a challenge for accurate splicing. Little is known about how cells prevent inadvertent and often deleterious expression of intronic elements due to cryptic splicing. In this study, we identify hnRNPM as an essential RNA binding protein that suppresses cryptic splicing through binding to deep introns, preserving transcriptome integrity. Long interspersed nuclear elements (LINEs) harbor large amounts of pseudo splice sites in introns. hnRNPM preferentially binds at intronic LINEs and represses LINE-containing pseudo splice site usage for cryptic splicing. Remarkably, a subgroup of the cryptic exons can form long dsRNAs through base-pairing of inverted Alu transposable elements scattered in between LINEs and trigger interferon immune response, a well-known antiviral defense mechanism. Notably, these interferon-associated pathways are found to be upregulated in hnRNPM-deficient tumors, which also exhibit elevated immune cell infiltration. These findings unveil hnRNPM as a guardian of transcriptome integrity. Targeting hnRNPM in tumors may be used to trigger an inflammatory immune response thereby boosting cancer surveillance.
Collapse
|
13
|
Chu YD, Fan TC, Lai MW, Yeh CT. GALNT14-mediated O-glycosylation on PHB2 serine-161 enhances cell growth, migration and drug resistance by activating IGF1R cascade in hepatoma cells. Cell Death Dis 2022; 13:956. [PMID: 36376274 PMCID: PMC9663550 DOI: 10.1038/s41419-022-05419-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022]
Abstract
The single nucleotide polymorphism (SNP) rs9679162 located on GALNT14 gene predicts therapeutic outcomes in patients with intermediate and advanced hepatocellular carcinoma (HCC), but the molecular mechanism remains unclear. Here, the associations between SNP genotypes, GALNT14 expression, and downstream molecular events were determined. A higher GALNT14 cancerous/noncancerous ratio was associated with the rs9679162-GG genotype, leading to an unfavorable postoperative prognosis. A novel exon-6-skipped GALNT14 mRNA variant was identified in patients carrying the rs9679162-TT genotype, which was associated with lower GALNT14 expression and favorable prognosis. Cell-based experiments showed that elevated levels of GALNT14 promoted HCC growth, migration, and resistance to anticancer drugs. Using a comparative lectin-capture glycoproteomic approach, PHB2 was identified as a substrate for GALNT14-mediated O-glycosylation. Site-directed mutagenesis experiments revealed that serine-161 (Ser161) was the O-glycosylation site. Further analysis showed that O-glycosylation of PHB2-Ser161 was required for the GALNT14-mediated growth-promoting phenotype. O-glycosylation of PHB2 was positively correlated with GALNT14 expression in HCC, resulting in increased interaction between PHB2 and IGFBP6, which in turn led to the activation of IGF1R-mediated signaling. In conclusion, the GALNT14-rs9679162 genotype was associated with differential expression levels of GALNT14 and the generation of a novel exon-6-skipped GALNT14 mRNA variant, which was associated with a favorable prognosis in HCC. The GALNT14/PHB2/IGF1R cascade modulated the growth, migration, and anticancer drug resistance of HCC cells, thereby opening the possibility of identifying new therapeutic targets against HCC.
Collapse
Affiliation(s)
- Yu-De Chu
- grid.413801.f0000 0001 0711 0593Liver Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Tan-Chi Fan
- grid.454210.60000 0004 1756 1461Institute of Stem Cell and Translational Cancer Research, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| | - Ming-Wei Lai
- grid.413801.f0000 0001 0711 0593Liver Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan ,grid.454211.70000 0004 1756 999XDivision of Pediatric Gastroenterology Department of Pediatrics, Linkou Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Chau-Ting Yeh
- grid.413801.f0000 0001 0711 0593Liver Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan ,grid.145695.a0000 0004 1798 0922Molecular Medicine Research Center, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
14
|
Liu CX, Chen LL. Circular RNAs: Characterization, cellular roles, and applications. Cell 2022; 185:2016-2034. [PMID: 35584701 DOI: 10.1016/j.cell.2022.04.021] [Citation(s) in RCA: 370] [Impact Index Per Article: 185.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Most circular RNAs are produced from the back-splicing of exons of precursor mRNAs. Recent technological advances have in part overcome problems with their circular conformation and sequence overlap with linear cognate mRNAs, allowing a better understanding of their cellular roles. Depending on their localization and specific interactions with DNA, RNA, and proteins, circular RNAs can modulate transcription and splicing, regulate stability and translation of cytoplasmic mRNAs, interfere with signaling pathways, and serve as templates for translation in different biological and pathophysiological contexts. Emerging applications of RNA circles to interfere with cellular processes, modulate immune responses, and direct translation into proteins shed new light on biomedical research. In this review, we discuss approaches used in circular RNA studies and the current understanding of their regulatory roles and potential applications.
Collapse
Affiliation(s)
- Chu-Xiao Liu
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Ling-Ling Chen
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China; School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China.
| |
Collapse
|
15
|
Ha D, Kim D, Kim I, Oh Y, Kong J, Han S, Kim S. Evolutionary rewiring of regulatory networks contributes to phenotypic differences between human and mouse orthologous genes. Nucleic Acids Res 2022; 50:1849-1863. [PMID: 35137181 PMCID: PMC8887464 DOI: 10.1093/nar/gkac050] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 01/14/2022] [Accepted: 01/25/2022] [Indexed: 11/14/2022] Open
Abstract
Mouse models have been engineered to reveal the biological mechanisms of human diseases based on an assumption. The assumption is that orthologous genes underlie conserved phenotypes across species. However, genetically modified mouse orthologs of human genes do not often recapitulate human disease phenotypes which might be due to the molecular evolution of phenotypic differences across species from the time of the last common ancestor. Here, we systematically investigated the evolutionary divergence of regulatory relationships between transcription factors (TFs) and target genes in functional modules, and found that the rewiring of gene regulatory networks (GRNs) contributes to the phenotypic discrepancies that occur between humans and mice. We confirmed that the rewired regulatory networks of orthologous genes contain a higher proportion of species-specific regulatory elements. Additionally, we verified that the divergence of target gene expression levels, which was triggered by network rewiring, could lead to phenotypic differences. Taken together, a careful consideration of evolutionary divergence in regulatory networks could be a novel strategy to understand the failure or success of mouse models to mimic human diseases. To help interpret mouse phenotypes in human disease studies, we provide quantitative comparisons of gene expression profiles on our website (http://sbi.postech.ac.kr/w/RN).
Collapse
Affiliation(s)
- Doyeon Ha
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | | | - Youngchul Oh
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, Korea
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, Korea
- Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, Korea
| |
Collapse
|
16
|
Abstract
Most of the transcribed human genome codes for noncoding RNAs (ncRNAs), and long noncoding RNAs (lncRNAs) make for the lion's share of the human ncRNA space. Despite growing interest in lncRNAs, because there are so many of them, and because of their tissue specialization and, often, lower abundance, their catalog remains incomplete and there are multiple ongoing efforts to improve it. Consequently, the number of human lncRNA genes may be lower than 10,000 or higher than 200,000. A key open challenge for lncRNA research, now that so many lncRNA species have been identified, is the characterization of lncRNA function and the interpretation of the roles of genetic and epigenetic alterations at their loci. After all, the most important human genes to catalog and study are those that contribute to important cellular functions-that affect development or cell differentiation and whose dysregulation may play a role in the genesis and progression of human diseases. Multiple efforts have used screens based on RNA-mediated interference (RNAi), antisense oligonucleotide (ASO), and CRISPR screens to identify the consequences of lncRNA dysregulation and predict lncRNA function in select contexts, but these approaches have unresolved scalability and accuracy challenges. Instead-as was the case for better-studied ncRNAs in the past-researchers often focus on characterizing lncRNA interactions and investigating their effects on genes and pathways with known functions. Here, we focus most of our review on computational methods to identify lncRNA interactions and to predict the effects of their alterations and dysregulation on human disease pathways.
Collapse
|
17
|
Abstract
Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes.
Collapse
Affiliation(s)
- Siobhan Cleary
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway H91 H3CY, Ireland;
| |
Collapse
|
18
|
Genetic drivers of m 6A methylation in human brain, lung, heart and muscle. Nat Genet 2021; 53:1156-1165. [PMID: 34211177 DOI: 10.1038/s41588-021-00890-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 05/18/2021] [Indexed: 01/22/2023]
Abstract
The most prevalent post-transcriptional mRNA modification, N6-methyladenosine (m6A), plays diverse RNA-regulatory roles, but its genetic control in human tissues remains uncharted. Here we report 129 transcriptome-wide m6A profiles, covering 91 individuals and 4 tissues (brain, lung, muscle and heart) from GTEx/eGTEx. We integrate these with interindividual genetic and expression variation, revealing 8,843 tissue-specific and 469 tissue-shared m6A quantitative trait loci (QTLs), which are modestly enriched in, but mostly orthogonal to, expression QTLs. We integrate m6A QTLs with disease genetics, identifying 184 GWAS-colocalized m6A QTL, including brain m6A QTLs underlying neuroticism, depression, schizophrenia and anxiety; lung m6A QTLs underlying expiratory flow and asthma; and muscle/heart m6A QTLs underlying coronary artery disease. Last, we predict novel m6A regulators that show preferential binding in m6A QTLs, protein interactions with known m6A regulators and expression correlation with the m6A levels of their targets. Our results provide important insights and resources for understanding both cis and trans regulation of epitranscriptomic modifications, their interindividual variation and their roles in human disease.
Collapse
|
19
|
Hotspot exons are common targets of splicing perturbations. Nat Commun 2021; 12:2756. [PMID: 33980843 PMCID: PMC8115636 DOI: 10.1038/s41467-021-22780-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 02/24/2021] [Indexed: 11/08/2022] Open
Abstract
High-throughput splicing assays have demonstrated that many exonic variants can disrupt splicing; however, splice-disrupting variants distribute non-uniformly across genes. We propose the existence of exons that are particularly susceptible to splice-disrupting variants, which we refer to as hotspot exons. Hotspot exons are also more susceptible to splicing perturbation through drug treatment and knock-down of RNA-binding proteins. We develop a classifier for exonic splice-disrupting variants and use it to infer hotspot exons. We estimate that 1400 exons in the human genome are hotspots. Using panels of splicing reporters, we demonstrate how the ability of an exon to tolerate a mutation is inversely proportional to the strength of its neighboring splice sites. Splicing-disrupting mutations are linked to diseases. By employing a machine learning approach, the authors show that certain exons, termed hotspot exons, are enriched for splicing-disruption variants and susceptible to splicing perturbations.
Collapse
|
20
|
Sun L, Xu K, Huang W, Yang YT, Li P, Tang L, Xiong T, Zhang QC. Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures. Cell Res 2021; 31:495-516. [PMID: 33623109 PMCID: PMC7900654 DOI: 10.1038/s41422-021-00476-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 01/19/2021] [Indexed: 01/31/2023] Open
Abstract
Interactions with RNA-binding proteins (RBPs) are integral to RNA function and cellular regulation, and dynamically reflect specific cellular conditions. However, presently available tools for predicting RBP-RNA interactions employ RNA sequence and/or predicted RNA structures, and therefore do not capture their condition-dependent nature. Here, after profiling transcriptome-wide in vivo RNA secondary structures in seven cell types, we developed PrismNet, a deep learning tool that integrates experimental in vivo RNA structure data and RBP binding data for matched cells to accurately predict dynamic RBP binding in various cellular conditions. PrismNet results for 168 RBPs support its utility for both understanding CLIP-seq results and largely extending such interaction data to accurately analyze additional cell types. Further, PrismNet employs an "attention" strategy to computationally identify exact RBP-binding nucleotides, and we discovered enrichment among dynamic RBP-binding sites for structure-changing variants (riboSNitches), which can link genetic diseases with dysregulated RBP bindings. Our rich profiling data and deep learning-based prediction tool provide access to a previously inaccessible layer of cell-type-specific RBP-RNA interactions, with clear utility for understanding and treating human diseases.
Collapse
Affiliation(s)
- Lei Sun
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Kui Xu
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Wenze Huang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Yucheng T Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Pan Li
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Lei Tang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Tuanlin Xiong
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Beijing Advanced Innovation Center for Structural Biology and Frontier Research Center for Biological Structure, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.
- Tsinghua-Peking Center for Life Sciences, Beijing 100084, China.
| |
Collapse
|
21
|
Wang J. Integrative analyses of transcriptome data reveal the mechanisms of post-transcriptional regulation. Brief Funct Genomics 2021; 20:207-212. [PMID: 33615339 DOI: 10.1093/bfgp/elab004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Post-transcriptional processing of RNAs plays important roles in a variety of physiological and pathological processes. These processes can be precisely controlled by a series of RNA binding proteins and cotranscriptionally regulated by transcription factors as well as histone modifications. With the rapid development of high-throughput sequencing techniques, multiomics data have been broadly used to study the mechanisms underlying the important biological processes. However, how to use these high-throughput sequencing data to elucidate the fundamental regulatory roles of post-transcriptional processes is still of great challenge. This review summarizes the regulatory mechanisms of post-transcriptional processes and the general principles and approaches to dissect these mechanisms by integrating multiomics data as well as public resources.
Collapse
Affiliation(s)
- Jinkai Wang
- Department of Medical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China.,Center for Stem Cell Biology and Tissue Engineering, Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Sun Yat-sen University, Guangzhou, China.,RNA Biomedical Institute, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510120, China
| |
Collapse
|
22
|
Garrido-Martín D, Borsari B, Calvo M, Reverter F, Guigó R. Identification and analysis of splicing quantitative trait loci across multiple tissues in the human genome. Nat Commun 2021; 12:727. [PMID: 33526779 PMCID: PMC7851174 DOI: 10.1038/s41467-020-20578-2] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 12/02/2020] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) is a fundamental step in eukaryotic mRNA biogenesis. Here, we develop an efficient and reproducible pipeline for the discovery of genetic variants that affect AS (splicing QTLs, sQTLs). We use it to analyze the GTEx dataset, generating a comprehensive catalog of sQTLs in the human genome. Downstream analysis of this catalog provides insight into the mechanisms underlying splicing regulation. We report that a core set of sQTLs is shared across multiple tissues. sQTLs often target the global splicing pattern of genes, rather than individual splicing events. Many also affect the expression of the same or other genes, uncovering regulatory loci that act through different mechanisms. sQTLs tend to be located in post-transcriptionally spliced introns, which would function as hotspots for splicing regulation. While many variants affect splicing patterns by altering the sequence of splice sites, many more modify the binding sites of RNA-binding proteins. Genetic variants affecting splicing can have a stronger phenotypic impact than those affecting gene expression.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain.
| | - Beatrice Borsari
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain
| | - Miquel Calvo
- Section of Statistics, Faculty of Biology, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain
| | - Ferran Reverter
- Section of Statistics, Faculty of Biology, Universitat de Barcelona (UB), Av. Diagonal 643, Barcelona, 08028, Spain
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Catalonia, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain.
| |
Collapse
|
23
|
Amoah K, Hsiao YHE, Bahn JH, Sun Y, Burghard C, Tan BX, Yang EW, Xiao X. Allele-specific alternative splicing and its functional genetic variants in human tissues. Genome Res 2021; 31:359-371. [PMID: 33452016 PMCID: PMC7919445 DOI: 10.1101/gr.265637.120] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 01/14/2021] [Indexed: 02/07/2023]
Abstract
Alternative splicing is an RNA processing mechanism that affects most genes in human, contributing to disease mechanisms and phenotypic diversity. The regulation of splicing involves an intricate network of cis-regulatory elements and trans-acting factors. Due to their high sequence specificity, cis-regulation of splicing can be altered by genetic variants, significantly affecting splicing outcomes. Recently, multiple methods have been applied to understanding the regulatory effects of genetic variants on splicing. However, it is still challenging to go beyond apparent association to pinpoint functional variants. To fill in this gap, we utilized large-scale data sets of the Genotype-Tissue Expression (GTEx) project to study genetically modulated alternative splicing (GMAS) via identification of allele-specific splicing events. We demonstrate that GMAS events are shared across tissues and individuals more often than expected by chance, consistent with their genetically driven nature. Moreover, although the allelic bias of GMAS exons varies across samples, the degree of variation is similar across tissues versus individuals. Thus, genetic background drives the GMAS pattern to a similar degree as tissue-specific splicing mechanisms. Leveraging the genetically driven nature of GMAS, we developed a new method to predict functional splicing-altering variants, built upon a genotype-phenotype concordance model across samples. Complemented by experimental validations, this method predicted >1000 functional variants, many of which may alter RNA-protein interactions. Lastly, 72% of GMAS-associated SNPs were in linkage disequilibrium with GWAS-reported SNPs, and such association was enriched in tissues of relevance for specific traits/diseases. Our study enables a comprehensive view of genetically driven splicing variations in human tissues.
Collapse
Affiliation(s)
- Kofi Amoah
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California 90095, USA
| | - Yun-Hua Esther Hsiao
- Department of Bioengineering, University of California, Los Angeles, California 90095, USA
| | - Jae Hoon Bahn
- Department of Integrative Biology and Physiology, University of California, Los Angeles, California 90095, USA
| | - Yiwei Sun
- Department of Integrative Biology and Physiology, University of California, Los Angeles, California 90095, USA
| | - Christina Burghard
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California 90095, USA
| | - Boon Xin Tan
- Department of Integrative Biology and Physiology, University of California, Los Angeles, California 90095, USA
| | - Ei-Wen Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, California 90095, USA
| | - Xinshu Xiao
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California 90095, USA.,Department of Bioengineering, University of California, Los Angeles, California 90095, USA.,Department of Integrative Biology and Physiology, University of California, Los Angeles, California 90095, USA.,Molecular Biology Institute, University of California, Los Angeles, California 90095, USA.,Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
24
|
Chan TW, Fu T, Bahn JH, Jun HI, Lee JH, Quinones-Valdez G, Cheng C, Xiao X. RNA editing in cancer impacts mRNA abundance in immune response pathways. Genome Biol 2020; 21:268. [PMID: 33106178 PMCID: PMC7586670 DOI: 10.1186/s13059-020-02171-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 09/25/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND RNA editing generates modifications to the RNA sequences, thereby increasing protein diversity and shaping various layers of gene regulation. Recent studies have revealed global shifts in editing levels across many cancer types, as well as a few specific mechanisms implicating individual sites in tumorigenesis or metastasis. However, most tumor-associated sites, predominantly in noncoding regions, have unknown functional relevance. RESULTS Here, we carry out integrative analysis of RNA editing profiles between epithelial and mesenchymal tumors, since epithelial-mesenchymal transition is a key paradigm for metastasis. We identify distinct editing patterns between epithelial and mesenchymal tumors in seven cancer types using TCGA data, an observation further supported by single-cell RNA sequencing data and ADAR perturbation experiments in cell culture. Through computational analyses and experimental validations, we show that differential editing sites between epithelial and mesenchymal phenotypes function by regulating mRNA abundance of their respective genes. Our analysis of RNA-binding proteins reveals ILF3 as a potential regulator of this process, supported by experimental validations. Consistent with the known roles of ILF3 in immune response, epithelial-mesenchymal differential editing sites are enriched in genes involved in immune and viral processes. The strongest target of editing-dependent ILF3 regulation is the transcript encoding PKR, a crucial player in immune and viral response. CONCLUSIONS Our study reports widespread differences in RNA editing between epithelial and mesenchymal tumors and a novel mechanism of editing-dependent regulation of mRNA abundance. It reveals the broad impact of RNA editing in cancer and its relevance to cancer-related immune pathways.
Collapse
Affiliation(s)
- Tracey W Chan
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ting Fu
- Molecular, Cellular and Integrative Physiology Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Jae Hoon Bahn
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA, USA
| | - Hyun-Ik Jun
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA, USA
| | - Jae-Hyung Lee
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA, USA
- Department of Life and Nanopharmaceutical Sciences & Oral Microbiology, School of Dentistry, Kyung Hee University, Seoul, South Korea
| | | | - Chonghui Cheng
- Lester & Sue Smith Breast Center & Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Xinshu Xiao
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Molecular, Cellular and Integrative Physiology Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA, USA.
- Molecular Biology Institute, UCLA, Los Angeles, CA, USA.
- Institute for Quantitative and Computational Sciences, UCLA, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
25
|
Tubeuf H, Charbonnier C, Soukarieh O, Blavier A, Lefebvre A, Dauchel H, Frebourg T, Gaildrat P, Martins A. Large-scale comparative evaluation of user-friendly tools for predicting variant-induced alterations of splicing regulatory elements. Hum Mutat 2020; 41:1811-1829. [PMID: 32741062 DOI: 10.1002/humu.24091] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 07/11/2020] [Accepted: 07/26/2020] [Indexed: 12/20/2022]
Abstract
Discriminating which nucleotide variants cause disease or contribute to phenotypic traits remains a major challenge in human genetics. In theory, any intragenic variant can potentially affect RNA splicing by altering splicing regulatory elements (SREs). However, these alterations are often ignored mainly because pioneer SRE predictors have proved inefficient. Here, we report the first large-scale comparative evaluation of four user-friendly SRE-dedicated algorithms (QUEPASA, HEXplorer, SPANR, and HAL) tested both as standalone tools and in multiple combined ways based on two independent benchmark datasets adding up to >1,300 exonic variants studied at the messenger RNA level and mapping to 89 different disease-causing genes. These methods display good predictive power, based on decision thresholds derived from the receiver operating characteristics curve analyses, with QUEPASA and HAL having the best accuracies either as standalone or in combination. Still, overall there was a tight race between the four predictors, suggesting that all methods may be of use. Additionally, QUEPASA and HEXplorer may be beneficial as well for predicting variant-induced creation of pseudoexons deep within introns. Our study highlights the potential of SRE predictors as filtering tools for identifying disease-causing candidates among the plethora of variants detected by high-throughput DNA sequencing and provides guidance for their use in genomic medicine settings.
Collapse
Affiliation(s)
- Hélène Tubeuf
- Inserm U1245, UNIROUEN, Normandie University, Normandy Centre for Genomic and Personalized Medicine, Rouen, France.,Interactive Biosoftware, Rouen, France
| | - Camille Charbonnier
- Inserm U1245, UNIROUEN, Normandie University, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Omar Soukarieh
- Inserm U1245, UNIROUEN, Normandie University, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | | | - Arnaud Lefebvre
- Computer Science, Information Processing and Systems Laboratory, UNIROUEN, Normandie University, Mont-Saint-Aignan, France
| | - Hélène Dauchel
- Computer Science, Information Processing and Systems Laboratory, UNIROUEN, Normandie University, Mont-Saint-Aignan, France
| | - Thierry Frebourg
- Inserm U1245, UNIROUEN, Normandie University, Normandy Centre for Genomic and Personalized Medicine, Rouen, France.,Department of Genetics, University Hospital, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Pascaline Gaildrat
- Inserm U1245, UNIROUEN, Normandie University, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Alexandra Martins
- Inserm U1245, UNIROUEN, Normandie University, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| |
Collapse
|
26
|
Xu S, Feng W, Lu Z, Yu CY, Shao W, Nakshatri H, Reiter JL, Gao H, Chu X, Wang Y, Liu Y. regSNPs-ASB: A Computational Framework for Identifying Allele-Specific Transcription Factor Binding From ATAC-seq Data. Front Bioeng Biotechnol 2020; 8:886. [PMID: 32850739 PMCID: PMC7405637 DOI: 10.3389/fbioe.2020.00886] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/09/2020] [Indexed: 12/21/2022] Open
Abstract
Expression quantitative trait loci (eQTL) analysis is useful for identifying genetic variants correlated with gene expression, however, it cannot distinguish between causal and nearby non-functional variants. Because the majority of disease-associated SNPs are located in regulatory regions, they can impact allele-specific binding (ASB) of transcription factors and result in differential expression of the target gene alleles. In this study, our aim was to identify functional single-nucleotide polymorphisms (SNPs) that alter transcriptional regulation and thus, potentially impact cellular function. Here, we present regSNPs-ASB, a generalized linear model-based approach to identify regulatory SNPs that are located in transcription factor binding sites. The input for this model includes ATAC-seq (assay for transposase-accessible chromatin with high-throughput sequencing) raw read counts from heterozygous loci, where differential transposase-cleavage patterns between two alleles indicate preferential transcription factor binding to one of the alleles. Using regSNPs-ASB, we identified 53 regulatory SNPs in human MCF-7 breast cancer cells and 125 regulatory SNPs in human mesenchymal stem cells (MSC). By integrating the regSNPs-ASB output with RNA-seq experimental data and publicly available chromatin interaction data from MCF-7 cells, we found that these 53 regulatory SNPs were associated with 74 potential target genes and that 32 (43%) of these genes showed significant allele-specific expression. By comparing all of the MCF-7 and MSC regulatory SNPs to the eQTLs in the Genome-Tissue Expression (GTEx) Project database, we found that 30% (16/53) of the regulatory SNPs in MCF-7 and 43% (52/122) of the regulatory SNPs in MSC were also in eQTL regions. The enrichment of regulatory SNPs in eQTLs indicated that many of them are likely responsible for allelic differences in gene expression (chi-square test, p-value < 0.01). In summary, we conclude that regSNPs-ASB is a useful tool for identifying causal variants from ATAC-seq data. This new computational tool will enable efficient prioritization of genetic variants identified as eQTL for further studies to validate their causal regulatory function. Ultimately, identifying causal genetic variants will further our understanding of the underlying molecular mechanisms of disease and the eventual development of potential therapeutic targets.
Collapse
Affiliation(s)
- Siwen Xu
- Institute of Intelligent System and Bioinformatics, College of Automation, Harbin Engineering University, Harbin, China.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Weixing Feng
- Institute of Intelligent System and Bioinformatics, College of Automation, Harbin Engineering University, Harbin, China
| | - Zixiao Lu
- Regenstrief Institute, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Christina Y Yu
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States.,Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Wei Shao
- Regenstrief Institute, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Harikrishna Nakshatri
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Jill L Reiter
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Hongyu Gao
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Xiaona Chu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Yue Wang
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Yunlong Liu
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, United States.,Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
| |
Collapse
|
27
|
Van Nostrand EL, Pratt GA, Yee BA, Wheeler EC, Blue SM, Mueller J, Park SS, Garcia KE, Gelboin-Burkhart C, Nguyen TB, Rabano I, Stanton R, Sundararaman B, Wang R, Fu XD, Graveley BR, Yeo GW. Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins. Genome Biol 2020; 21:90. [PMID: 32252787 PMCID: PMC7137325 DOI: 10.1186/s13059-020-01982-9] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 03/03/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND A critical step in uncovering rules of RNA processing is to study the in vivo regulatory networks of RNA binding proteins (RBPs). Crosslinking and immunoprecipitation (CLIP) methods enable mapping RBP targets transcriptome-wide, but methodological differences present challenges to large-scale analysis across datasets. The development of enhanced CLIP (eCLIP) enabled the mapping of targets for 150 RBPs in K562 and HepG2, creating a unique resource of RBP interactomes profiled with a standardized methodology in the same cell types. RESULTS Our analysis of 223 eCLIP datasets reveals a range of binding modalities, including highly resolved positioning around splicing signals and mRNA untranslated regions that associate with distinct RBP functions. Quantification of enrichment for repetitive and abundant multicopy elements reveals 70% of RBPs have enrichment for non-mRNA element classes, enables identification of novel ribosomal RNA processing factors and sites, and suggests that association with retrotransposable elements reflects multiple RBP mechanisms of action. Analysis of spliceosomal RBPs indicates that eCLIP resolves AQR association after intronic lariat formation, enabling identification of branch points with single-nucleotide resolution, and provides genome-wide validation for a branch point-based scanning model for 3' splice site recognition. Finally, we show that eCLIP peak co-occurrences across RBPs enable the discovery of novel co-interacting RBPs. CONCLUSIONS This work reveals novel insights into RNA biology by integrated analysis of eCLIP profiling of 150 RBPs with distinct functions. Further, our quantification of both mRNA and other element association will enable further research to identify novel roles of RBPs in regulating RNA processing.
Collapse
Affiliation(s)
- Eric L Van Nostrand
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Gabriel A Pratt
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Brian A Yee
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Emily C Wheeler
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Steven M Blue
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Jasmine Mueller
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Samuel S Park
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Keri E Garcia
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Chelsea Gelboin-Burkhart
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Thai B Nguyen
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ines Rabano
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Rebecca Stanton
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Balaji Sundararaman
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Ruth Wang
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Xiang-Dong Fu
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA.
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
28
|
Teng H, Wei W, Li Q, Xue M, Shi X, Li X, Mao F, Sun Z. Prevalence and architecture of posttranscriptionally impaired synonymous mutations in 8,320 genomes across 22 cancer types. Nucleic Acids Res 2020; 48:1192-1205. [PMID: 31950163 PMCID: PMC7026592 DOI: 10.1093/nar/gkaa019] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 01/07/2020] [Indexed: 02/06/2023] Open
Abstract
Somatic synonymous mutations are one of the most frequent genetic variants occurring in the coding region of cancer genomes, while their contributions to cancer development remain largely unknown. To assess whether synonymous mutations involved in post-transcriptional regulation contribute to the genetic etiology of cancers, we collected whole exome data from 8,320 patients across 22 cancer types. By employing our developed algorithm, PIVar, we identified a total of 22,948 posttranscriptionally impaired synonymous SNVs (pisSNVs) spanning 2,042 genes. In addition, 35 RNA binding proteins impacted by these identified pisSNVs were significantly enriched. Remarkably, we discovered markedly elevated ratio of somatic pisSNVs across all 22 cancer types, and a high pisSNV ratio was associated with worse patient survival in five cancer types. Intriguing, several well-established cancer genes, including PTEN, RB1 and PIK3CA, appeared to contribute to tumorigenesis at both protein function and posttranscriptional regulation levels, whereas some pisSNV-hosted genes, including UBR4, EP400 and INTS1, exerted their function during carcinogenesis mainly via posttranscriptional mechanisms. Moreover, we predicted three drugs associated with two pisSNVs, and numerous compounds associated with expression signature of pisSNV-hosted genes. Our study reveals the prevalence and clinical relevance of pisSNVs in cancers, and emphasizes the importance of considering posttranscriptional impaired synonymous mutations in cancer biology.
Collapse
Affiliation(s)
- Huajing Teng
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing 100142, China
| | - Wenqing Wei
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qinglan Li
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Meiying Xue
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaohui Shi
- Sino-Danish college, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xianfeng Li
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China.,Key laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing 100142, China
| | - Fengbiao Mao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhongsheng Sun
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
29
|
Ghanbari M, Ohler U. Deep neural networks for interpreting RNA-binding protein target preferences. Genome Res 2020; 30:214-226. [PMID: 31992613 PMCID: PMC7050519 DOI: 10.1101/gr.247494.118] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 01/07/2020] [Indexed: 11/29/2022]
Abstract
Deep learning has become a powerful paradigm to analyze the binding sites of regulatory factors including RNA-binding proteins (RBPs), owing to its strength to learn complex features from possibly multiple sources of raw data. However, the interpretability of these models, which is crucial to improve our understanding of RBP binding preferences and functions, has not yet been investigated in significant detail. We have designed a multitask and multimodal deep neural network for characterizing in vivo RBP targets. The model incorporates not only the sequence but also the region type of the binding sites as input, which helps the model to boost the prediction performance. To interpret the model, we quantified the contribution of the input features to the predictive score of each RBP. Learning across multiple RBPs at once, we are able to avoid experimental biases and to identify the RNA sequence motifs and transcript context patterns that are the most important for the predictions of each individual RBP. Our findings are consistent with known motifs and binding behaviors and can provide new insights about the regulatory functions of RBPs.
Collapse
Affiliation(s)
- Mahsa Ghanbari
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany
| | - Uwe Ohler
- The Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 10115 Berlin, Germany.,Department of Biology, Humboldt Universität zu Berlin, 10117 Berlin, Germany.,Department of Computer Science, Humboldt Universität zu Berlin, 10117 Berlin, Germany
| |
Collapse
|
30
|
mountainClimber Identifies Alternative Transcription Start and Polyadenylation Sites in RNA-Seq. Cell Syst 2019; 9:393-400.e6. [PMID: 31542416 DOI: 10.1016/j.cels.2019.07.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 06/06/2019] [Accepted: 07/24/2019] [Indexed: 12/28/2022]
Abstract
Alternative transcription start (ATS) and alternative polyadenylation (APA) create alternative RNA isoforms and modulate many aspects of RNA expression and protein production. However, ATS and APA remain difficult to detect in RNA sequencing (RNA-seq). Here, we developed mountainClimber, a de novo cumulative-sum-based approach to identify ATS and APA as change points. Unlike many existing methods, mountainClimber runs on a single sample and identifies multiple ATS or APA sites anywhere in the transcript. We analyzed 2,342 GTEx samples (36 tissues, 215 individuals) and found that tissue type is the predominant driver of transcript end variations. 75% and 65% of genes exhibited differential APA and ATS across tissues, respectively. In particular, testis displayed longer 5' untranslated regions (UTRs) and shorter 3' UTRs, often in genes related to testis-specific biology. Overall, we report the largest study of transcript ends across human tissues to our knowledge. mountainClimber is available at github.com/gxiaolab/mountainClimber.
Collapse
|