1
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
2
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2024:10.1038/s41576-024-00774-2. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
3
|
Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature 2024; 634:995-1003. [PMID: 39322666 PMCID: PMC11499273 DOI: 10.1038/s41586-024-07966-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 08/20/2024] [Indexed: 09/27/2024]
Abstract
There are more ways to synthesize a 100-amino acid (aa) protein (20100) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces1. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 1010, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.
Collapse
Affiliation(s)
- Andre J Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- ALLOX, Barcelona, Spain.
| | - Aina Martí-Aranda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Cristina Hidalgo-Carcedo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Antoni Beltran
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jörn M Schmiedel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- factorize.bio, Berlin, Germany
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
4
|
Lin TC, Tsai CH, Shiau CK, Huang JH, Tsai HK. Predicting splicing patterns from the transcription factor binding sites in the promoter with deep learning. BMC Genomics 2024; 25:830. [PMID: 39227799 PMCID: PMC11373144 DOI: 10.1186/s12864-024-10667-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 07/25/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Alternative splicing is a pivotal mechanism of post-transcriptional modification that contributes to the transcriptome plasticity and proteome diversity in metazoan cells. Although many splicing regulations around the exon/intron regions are known, the relationship between promoter-bound transcription factors and the downstream alternative splicing largely remains unexplored. RESULTS In this study, we present computational approaches to unravel the regulatory relationship between promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine dataset that includes DNase I hypersensitive site sequencing and transcriptomes across fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to examine the associations between the promoter and downstream splicing events. While machine learning models demonstrated potential in predicting splicing patterns based on TFBS occupancies, the limitations in the generalization of predicting the splicing forms of singleton genes across diverse tissues was observed with carefully examination using different cross-validation methods. We further investigated the association between alterations in individual TFBS at promoters and shifts in exon splicing efficiency. Our results demonstrate that the convolutional neural network (CNN) models, trained on TF binding changes in the promoters, can predict the changes in splicing patterns. Furthermore, a systemic in silico substitutions analysis on the CNN models highlighted several potential splicing regulators. Notably, using empirical validation using K562 CTCFL shRNA knock-down data, we showed the significant role of CTCFL in splicing regulation. CONCLUSION In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.
Collapse
Affiliation(s)
- Tzu-Chieh Lin
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Cheng-Hung Tsai
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Cheng-Kai Shiau
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Jia-Hsin Huang
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
- Taiwan AI Labs & Foundation, Taipei, 10351, Taiwan.
| | - Huai-Kuang Tsai
- Institute of Information Science, Academia Sinica, Taipei, 11529, Taiwan.
- Taiwan AI Labs & Foundation, Taipei, 10351, Taiwan.
| |
Collapse
|
5
|
Wang D, Gazzara MR, Jewell S, Wales-McGrath B, Brown CD, Choi PS, Barash Y. A Deep Dive into Statistical Modeling of RNA Splicing QTLs Reveals New Variants that Explain Neurodegenerative Disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.01.610696. [PMID: 39282456 PMCID: PMC11398334 DOI: 10.1101/2024.09.01.610696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Genome-wide association studies (GWAS) have identified thousands of putative disease causing variants with unknown regulatory effects. Efforts to connect these variants with splicing quantitative trait loci (sQTLs) have provided functional insights, yet sQTLs reported by existing methods cannot explain many GWAS signals. We show current sQTL modeling approaches can be improved by considering alternative splicing representation, model calibration, and covariate integration. We then introduce MAJIQTL, a new pipeline for sQTL discovery. MAJIQTL includes two new statistical methods: a weighted multiple testing approach for sGene discovery and a model for sQTL effect size inference to improve variant prioritization. By applying MAJIQTL to GTEx, we find significantly more sGenes harboring sQTLs with functional significance. Notably, our analysis implicates the novel variant rs582283 in Alzheimer's disease. Using antisense oligonucleotides, we validate this variant's effect by blocking the implicated YBX3 binding site, leading to exon skipping in the gene MS4A3.
Collapse
Affiliation(s)
- David Wang
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
| | - San Jewell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
| | | | | | - Peter S. Choi
- Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania
- Division of Cancer Pathobiology, The Children’s Hospital of Philadelphia
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania
- Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania
| |
Collapse
|
6
|
Xu C, Bao S, Wang Y, Li W, Chen H, Shen Y, Jiang T, Zhang C. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. Genome Res 2024; 34:1052-1065. [PMID: 39060028 PMCID: PMC11368187 DOI: 10.1101/gr.279044.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 07/18/2024] [Indexed: 07/28/2024]
Abstract
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes, and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ∼15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders (NDDs), including 19 genes with recurrent splicing-altering mutations. Integration of splicing-altering mutations with other types of de novo mutation burdens allowed the prediction of eight novel NDD-risk genes. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
Collapse
Affiliation(s)
- Chencheng Xu
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Suying Bao
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | - Ye Wang
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | - Wenxing Li
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Tao Jiang
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA
| | - Chaolin Zhang
- Department of Systems Biology, Columbia University, New York, New York 10032, USA;
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| |
Collapse
|
7
|
Chen SK, Liu J, Van Nynatten A, Tudor-Price BM, Chang BSW. Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods. J Mol Evol 2024:10.1007/s00239-024-10179-8. [PMID: 38886207 DOI: 10.1007/s00239-024-10179-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/20/2024] [Indexed: 06/20/2024]
Abstract
Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.
Collapse
Affiliation(s)
- Steven K Chen
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Jing Liu
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexander Van Nynatten
- Department of Biological Science, University of Toronto Scarborough, Toronto, ON, Canada
| | | | - Belinda S W Chang
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Ecology & Evolutionary Biology, University of Toronto, Toronto, ON, Canada.
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
8
|
McCue K, Burge CB. An interpretable model of pre-mRNA splicing for animal and plant genes. SCIENCE ADVANCES 2024; 10:eadn1547. [PMID: 38718117 PMCID: PMC11078188 DOI: 10.1126/sciadv.adn1547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/04/2024] [Indexed: 05/12/2024]
Abstract
Pre-mRNA splicing is a fundamental step in gene expression, conserved across eukaryotes, in which the spliceosome recognizes motifs at the 3' and 5' splice sites (SSs), excises introns, and ligates exons. SS recognition and pairing is often influenced by protein splicing factors (SFs) that bind to splicing regulatory elements (SREs). Here, we describe SMsplice, a fully interpretable model of pre-mRNA splicing that combines models of core SS motifs, SREs, and exonic and intronic length preferences. We learn models that predict SS locations with 83 to 86% accuracy in fish, insects, and plants and about 70% in mammals. Learned SRE motifs include both known SF binding motifs and unfamiliar motifs, and both motif classes are supported by genetic analyses. Our comparisons across species highlight similarities between non-mammals, increased reliance on intronic SREs in plant splicing, and a greater reliance on SREs in mammalian splicing.
Collapse
Affiliation(s)
- Kayla McCue
- Computational and Systems Biology PhD Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Christopher B. Burge
- Computational and Systems Biology PhD Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
9
|
Faure AJ, Lehner B, Miró Pina V, Serrano Colome C, Weghorn D. An extension of the Walsh-Hadamard transform to calculate and model epistasis in genetic landscapes of arbitrary shape and complexity. PLoS Comput Biol 2024; 20:e1012132. [PMID: 38805561 PMCID: PMC11161127 DOI: 10.1371/journal.pcbi.1012132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 06/07/2024] [Accepted: 05/04/2024] [Indexed: 05/30/2024] Open
Abstract
Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings. However, one of its main limitations is that it can only accommodate two alleles (amino acid or nucleotide states) per sequence position. In this paper we provide an extension of the Walsh-Hadamard transform that allows the calculation and modeling of background-averaged epistasis (also known as ensemble epistasis) in genetic landscapes with an arbitrary number of states per position (20 for amino acids, 4 for nucleotides, etc.). We also provide a recursive formula for the inverse matrix and then derive formulae to directly extract any element of either matrix without having to rely on the computationally intensive task of constructing or inverting large matrices. Finally, we demonstrate the utility of our theory by using it to model epistasis within both simulated and empirical multiallelic fitness landscapes, revealing that both pairwise and higher-order genetic interactions are enriched between physically interacting positions.
Collapse
Affiliation(s)
- Andre J. Faure
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Verónica Miró Pina
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Claudia Serrano Colome
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Donate Weghorn
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
10
|
Xu C, Bao S, Chen H, Jiang T, Zhang C. Reference-informed prediction of alternative splicing and splicing-altering mutations from sequences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.22.586363. [PMID: 38586002 PMCID: PMC10996483 DOI: 10.1101/2024.03.22.586363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Alternative splicing plays a crucial role in protein diversity and gene expression regulation in higher eukaryotes and mutations causing dysregulated splicing underlie a range of genetic diseases. Computational prediction of alternative splicing from genomic sequences not only provides insight into gene-regulatory mechanisms but also helps identify disease-causing mutations and drug targets. However, the current methods for the quantitative prediction of splice site usage still have limited accuracy. Here, we present DeltaSplice, a deep neural network model optimized to learn the impact of mutations on quantitative changes in alternative splicing from the comparative analysis of homologous genes. The model architecture enables DeltaSplice to perform "reference-informed prediction" by incorporating the known splice site usage of a reference gene sequence to improve its prediction on splicing-altering mutations. We benchmarked DeltaSplice and several other state-of-the-art methods on various prediction tasks, including evolutionary sequence divergence on lineage-specific splicing and splicing-altering mutations in human populations and neurodevelopmental disorders, and demonstrated that DeltaSplice outperformed consistently. DeltaSplice predicted ~15% of splicing quantitative trait loci (sQTLs) in the human brain as causal splicing-altering variants. It also predicted splicing-altering de novo mutations outside the splice sites in a subset of patients affected by autism and other neurodevelopmental disorders, including 19 genes with recurrent splicing-altering mutations. Among the new candidate disease risk genes, MFN1 is involved in mitochondria fusion, which is frequently disrupted in autism patients. Our work expanded the capacity of in silico splicing models with potential applications in genetic diagnosis and the development of splicing-based precision medicine.
Collapse
Affiliation(s)
- Chencheng Xu
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Present address: Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Suying Bao
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
- Present address: Regeneron Pharmaceuticals, Terrytown, NY 10591, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
- Present address: Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tao Jiang
- Bioinformatics Division, BNRIST, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521, USA
| | - Chaolin Zhang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
11
|
Ishigami Y, Wong MS, Martí-Gómez C, Ayaz A, Kooshkbaghi M, Hanson SM, McCandlish DM, Krainer AR, Kinney JB. Specificity, synergy, and mechanisms of splice-modifying drugs. Nat Commun 2024; 15:1880. [PMID: 38424098 PMCID: PMC10904865 DOI: 10.1038/s41467-024-46090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 02/10/2024] [Indexed: 03/02/2024] Open
Abstract
Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5' splice site sequences, suggest that branaplam recognizes 5' splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies.
Collapse
Affiliation(s)
- Yuma Ishigami
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Mandy S Wong
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- Beam Therapeutics, Cambridge, MA, 02142, USA
| | | | - Andalus Ayaz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Mahdi Kooshkbaghi
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- The Estée Lauder Companies, New York, NY, 10153, USA
| | | | | | - Adrian R Krainer
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| | - Justin B Kinney
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
12
|
Tao Y, Zhang Q, Wang H, Yang X, Mu H. Alternative splicing and related RNA binding proteins in human health and disease. Signal Transduct Target Ther 2024; 9:26. [PMID: 38302461 PMCID: PMC10835012 DOI: 10.1038/s41392-024-01734-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 12/18/2023] [Accepted: 12/27/2023] [Indexed: 02/03/2024] Open
Abstract
Alternative splicing (AS) serves as a pivotal mechanism in transcriptional regulation, engendering transcript diversity, and modifications in protein structure and functionality. Across varying tissues, developmental stages, or under specific conditions, AS gives rise to distinct splice isoforms. This implies that these isoforms possess unique temporal and spatial roles, thereby associating AS with standard biological activities and diseases. Among these, AS-related RNA-binding proteins (RBPs) play an instrumental role in regulating alternative splicing events. Under physiological conditions, the diversity of proteins mediated by AS influences the structure, function, interaction, and localization of proteins, thereby participating in the differentiation and development of an array of tissues and organs. Under pathological conditions, alterations in AS are linked with various diseases, particularly cancer. These changes can lead to modifications in gene splicing patterns, culminating in changes or loss of protein functionality. For instance, in cancer, abnormalities in AS and RBPs may result in aberrant expression of cancer-associated genes, thereby promoting the onset and progression of tumors. AS and RBPs are also associated with numerous neurodegenerative diseases and autoimmune diseases. Consequently, the study of AS across different tissues holds significant value. This review provides a detailed account of the recent advancements in the study of alternative splicing and AS-related RNA-binding proteins in tissue development and diseases, which aids in deepening the understanding of gene expression complexity and offers new insights and methodologies for precision medicine.
Collapse
Affiliation(s)
- Yining Tao
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 200000, Shanghai, China
- Shanghai Bone Tumor Institution, 200000, Shanghai, China
| | - Qi Zhang
- Department of Biochemistry and Molecular Cell Biology, Shanghai Key Laboratory for Tumor Microenvironment and Inflammation, Shanghai Jiao Tong University School of Medicine, 200000, Shanghai, China
| | - Haoyu Wang
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 200000, Shanghai, China
- Shanghai Bone Tumor Institution, 200000, Shanghai, China
| | - Xiyu Yang
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 200000, Shanghai, China
- Shanghai Bone Tumor Institution, 200000, Shanghai, China
| | - Haoran Mu
- Department of Orthopedics, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 200000, Shanghai, China.
- Shanghai Bone Tumor Institution, 200000, Shanghai, China.
| |
Collapse
|
13
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
14
|
Lee S, Aubee JI, Lai EC. Regulation of alternative splicing and polyadenylation in neurons. Life Sci Alliance 2023; 6:e202302000. [PMID: 37793776 PMCID: PMC10551640 DOI: 10.26508/lsa.202302000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 09/22/2023] [Accepted: 09/25/2023] [Indexed: 10/06/2023] Open
Abstract
Cell-type-specific gene expression is a fundamental feature of multicellular organisms and is achieved by combinations of regulatory strategies. Although cell-restricted transcription is perhaps the most widely studied mechanism, co-transcriptional and post-transcriptional processes are also central to the spatiotemporal control of gene functions. One general category of expression control involves the generation of multiple transcript isoforms from an individual gene, whose balance and cell specificity are frequently tightly regulated via diverse strategies. The nervous system makes particularly extensive use of cell-specific isoforms, specializing the neural function of genes that are expressed more broadly. Here, we review regulatory strategies and RNA-binding proteins that direct neural-specific isoform processing. These include various classes of alternative splicing and alternative polyadenylation events, both of which broadly diversify the neural transcriptome. Importantly, global alterations of splicing and alternative polyadenylation are characteristic of many neural pathologies, and recent genetic studies demonstrate how misregulation of individual neural isoforms can directly cause mutant phenotypes.
Collapse
Affiliation(s)
- Seungjae Lee
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Joseph I Aubee
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Eric C Lai
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| |
Collapse
|
15
|
Ma N, Xu H, Zhang W, Sun X, Guo R, Liu D, Zhang L, Liu Y, Zhang J, Qiao C, Chen D, Luo A, Bai J. Genome-wide analysis revealed the dysregulation of RNA binding protein-correlated alternative splicing events in myocardial ischemia reperfusion injury. BMC Med Genomics 2023; 16:251. [PMID: 37858115 PMCID: PMC10585833 DOI: 10.1186/s12920-023-01706-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 10/16/2023] [Indexed: 10/21/2023] Open
Abstract
BACKGROUND Myocardial ischemia reperfusion injury (MIRI), the tissue damage which is caused by the returning of blood supply to tissue after a period of ischemia, greatly reduces the therapeutic effect of treatment of myocardial infarction. But the underlying functional mechanisms of MIRI are still unclear. METHODS We constructed mouse models of MIRI, extracted injured and healthy myocardial tissues, and performed transcriptome sequencing experiments (RNA-seq) to systematically investigate the dysregulated transcriptome of MIRI, especially the alternative splicing (AS) regulation and RNA binding proteins (RBPs). Selected RBPs and MIRI-associated AS events were then validated by RT-qPCR experiments. RESULTS The differentially expressed gene (DEG) analyses indicated that transcriptome profiles were changed by MIRI and that DEGs' enriched functions were consistent with MIRI's dysregulated pathways. Furthermore, the AS profile was synergistically regulated and showed clear differences between the mouse model and the healthy samples. The exon skipping events significantly increased in MIRI model samples, while the opposite cassette exon events significantly decreased. According to the functional analysis, regulated alternative splicing genes (RASGs) were enriched in protein transport, cell division /cell cycle, RNA splicing, and endocytosis pathways, which were associated with the development of MIRI. Meanwhile, 493 differentially expressed RBPs (DE RBPs) were detected, most of which were correlated with the changed ratios of AS events. In addition, nine DE RBP genes were validated, including Eif5, Pdia6, Tagln2, Vasp, Zfp36l2, Grsf1, Idh2, Ndrg2, and Uqcrc1. These nine DE RBPs were correlated with RASGs enriched in translation process, cell growth and division, and endocytosis pathways, highly consistent with the functions of all RASGs. Finally, we validated the AS ratio changes of five regulated alternative splicing events (RASEs) derived from important regulatory genes, including Mtmr3, Cdc42, Cd47, Fbln2, Vegfa, and Fhl2. CONCLUSION Our study emphasized the critical roles of the dysregulated AS profiles in MIRI development, investigated the potential functions of MIRI-associated RASGs, and identified regulatory RBPs involved in AS regulation. We propose that the identified RASEs and RBPs could serve as important regulators and potential therapeutic targets in MIRI treatment in the future.
Collapse
Affiliation(s)
- Ning Ma
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Hao Xu
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Weihua Zhang
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Xiaoke Sun
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Ruiming Guo
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Donghai Liu
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Liang Zhang
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Yang Liu
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Jian Zhang
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Chenhui Qiao
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China
| | - Dong Chen
- Wuhan Ruixing Biotechnology Co., Ltd, Wuhan, 430206, Hubei, P.R. China
| | - Ailing Luo
- Wuhan Ruixing Biotechnology Co., Ltd, Wuhan, 430206, Hubei, P.R. China
| | - Jingyun Bai
- Department of Nephrology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, Henan, P.R. China.
| |
Collapse
|
16
|
Liao SE, Sudarshan M, Regev O. Deciphering RNA splicing logic with interpretable machine learning. Proc Natl Acad Sci U S A 2023; 120:e2221165120. [PMID: 37796983 PMCID: PMC10576025 DOI: 10.1073/pnas.2221165120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/29/2023] [Indexed: 10/07/2023] Open
Abstract
Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: Despite their excellent accuracy, they cannot describe how they arrived at their predictions. Here, using an "interpretable-by-design" approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed uncharacterized components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Collapse
Affiliation(s)
- Susan E. Liao
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Oded Regev
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| |
Collapse
|
17
|
Wang R, Helbig I, Edmondson AC, Lin L, Xing Y. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open
Abstract
Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients.
Collapse
Affiliation(s)
- Robert Wang
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew C Edmondson
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
18
|
Wagner A. Evolvability-enhancing mutations in the fitness landscapes of an RNA and a protein. Nat Commun 2023; 14:3624. [PMID: 37336901 PMCID: PMC10279741 DOI: 10.1038/s41467-023-39321-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 06/05/2023] [Indexed: 06/21/2023] Open
Abstract
Can evolvability-the ability to produce adaptive heritable variation-itself evolve through adaptive Darwinian evolution? If so, then Darwinian evolution may help create the conditions that enable Darwinian evolution. Here I propose a framework that is suitable to address this question with available experimental data on adaptive landscapes. I introduce the notion of an evolvability-enhancing mutation, which increases the likelihood that subsequent mutations in an evolving organism, protein, or RNA molecule are adaptive. I search for such mutations in the experimentally characterized and combinatorially complete fitness landscapes of a protein and an RNA molecule. I find that such evolvability-enhancing mutations indeed exist. They constitute a small fraction of all mutations, which shift the distribution of fitness effects of subsequent mutations towards less deleterious mutations, and increase the incidence of beneficial mutations. Evolving populations which experience such mutations can evolve significantly higher fitness. The study of evolvability-enhancing mutations opens many avenues of investigation into the evolution of evolvability.
Collapse
Affiliation(s)
- Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.
- The Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
19
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
20
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
21
|
Wagner N, Çelik MH, Hölzlwimmer FR, Mertes C, Prokisch H, Yépez VA, Gagneur J. Aberrant splicing prediction across human tissues. Nat Genet 2023; 55:861-870. [PMID: 37142848 DOI: 10.1038/s41588-023-01373-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 03/14/2023] [Indexed: 05/06/2023]
Abstract
Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.
Collapse
Affiliation(s)
- Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Muhammed H Çelik
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
| | - Florian R Hölzlwimmer
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Christian Mertes
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Vicente A Yépez
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
22
|
Rogalska ME, Vivori C, Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat Rev Genet 2023; 24:251-269. [PMID: 36526860 DOI: 10.1038/s41576-022-00556-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2022] [Indexed: 12/23/2022]
Abstract
The removal of introns from mRNA precursors and its regulation by alternative splicing are key for eukaryotic gene expression and cellular function, as evidenced by the numerous pathologies induced or modified by splicing alterations. Major recent advances have been made in understanding the structures and functions of the splicing machinery, in the description and classification of physiological and pathological isoforms and in the development of the first therapies for genetic diseases based on modulation of splicing. Here, we review this progress and discuss important remaining challenges, including predicting splice sites from genomic sequences, understanding the variety of molecular mechanisms and logic of splicing regulation, and harnessing this knowledge for probing gene function and disease aetiology and for the design of novel therapeutic approaches.
Collapse
Affiliation(s)
- Malgorzata Ewa Rogalska
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Vivori
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- The Francis Crick Institute, London, UK
| | - Juan Valcárcel
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
23
|
Boumpas P, Merabet S, Carnesecchi J. Integrating transcription and splicing into cell fate: Transcription factors on the block. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1752. [PMID: 35899407 DOI: 10.1002/wrna.1752] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 06/22/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022]
Abstract
Transcription factors (TFs) are present in all life forms and conserved across great evolutionary distances in eukaryotes. From yeast to complex multicellular organisms, they are pivotal players of cell fate decision by orchestrating gene expression at diverse molecular layers. Notably, TFs fine-tune gene expression by coordinating RNA fate at both the expression and splicing levels. They regulate alternative splicing, an essential mechanism for cell plasticity, allowing the production of many mRNA and protein isoforms in precise cell and tissue contexts. Despite this apparent role in splicing, how TFs integrate transcription and splicing to ultimately orchestrate diverse cell functions and cell fate decisions remains puzzling. We depict substantial studies in various model organisms underlining the key role of TFs in alternative splicing for promoting tissue-specific functions and cell fate. Furthermore, we emphasize recent advances describing the molecular link between the transcriptional and splicing activities of TFs. As TFs can bind both DNA and/or RNA to regulate transcription and splicing, we further discuss their flexibility and compatibility for DNA and RNA substrates. Finally, we propose several models integrating transcription and splicing activities of TFs in the coordination and diversification of cell and tissue identities. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Processing > Splicing Mechanisms.
Collapse
Affiliation(s)
- Panagiotis Boumpas
- Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, Lyon, France
| | - Samir Merabet
- Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, Lyon, France
| | - Julie Carnesecchi
- Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, Lyon, France
| |
Collapse
|
24
|
Horn T, Gosliga A, Li C, Enculescu M, Legewie S. Position-dependent effects of RNA-binding proteins in the context of co-transcriptional splicing. NPJ Syst Biol Appl 2023; 9:1. [PMID: 36653378 PMCID: PMC9849329 DOI: 10.1038/s41540-022-00264-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 12/08/2022] [Indexed: 01/19/2023] Open
Abstract
Alternative splicing is an important step in eukaryotic mRNA pre-processing which increases the complexity of gene expression programs, but is frequently altered in disease. Previous work on the regulation of alternative splicing has demonstrated that splicing is controlled by RNA-binding proteins (RBPs) and by epigenetic DNA/histone modifications which affect splicing by changing the speed of polymerase-mediated pre-mRNA transcription. The interplay of these different layers of splicing regulation is poorly understood. In this paper, we derived mathematical models describing how splicing decisions in a three-exon gene are made by combinatorial spliceosome binding to splice sites during ongoing transcription. We additionally take into account the effect of a regulatory RBP and find that the RBP binding position within the sequence is a key determinant of how RNA polymerase velocity affects splicing. Based on these results, we explain paradoxical observations in the experimental literature and further derive rules explaining why the same RBP can act as inhibitor or activator of cassette exon inclusion depending on its binding position. Finally, we derive a stochastic description of co-transcriptional splicing regulation at the single-cell level and show that splicing outcomes show little noise and follow a binomial distribution despite complex regulation by a multitude of factors. Taken together, our simulations demonstrate the robustness of splicing outcomes and reveal that quantitative insights into kinetic competition of co-transcriptional events are required to fully understand this important mechanism of gene expression diversity.
Collapse
Affiliation(s)
- Timur Horn
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Alison Gosliga
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
- University of Stuttgart, Department of Systems Biology and Stuttgart Research Center Systems Biology (SRCSB), Allmandring 31, 70569, Stuttgart, Germany
| | - Congxin Li
- University of Stuttgart, Department of Systems Biology and Stuttgart Research Center Systems Biology (SRCSB), Allmandring 31, 70569, Stuttgart, Germany
| | - Mihaela Enculescu
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
| | - Stefan Legewie
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
- University of Stuttgart, Department of Systems Biology and Stuttgart Research Center Systems Biology (SRCSB), Allmandring 31, 70569, Stuttgart, Germany.
| |
Collapse
|
25
|
Wei H, Li X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front Genet 2023; 14:1087267. [PMID: 36713072 PMCID: PMC9878224 DOI: 10.3389/fgene.2023.1087267] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 01/02/2023] [Indexed: 01/13/2023] Open
Abstract
Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers' projects.
Collapse
Affiliation(s)
- Huijin Wei
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
| | - Xianghua Li
- Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom
- The Second Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang, China
- Biomedical and Health Translational Centre of Zhejiang Province, Haining, Zhejiang, China
| |
Collapse
|
26
|
Petrova V, Song R, Nordström KJV, Walter J, Wong JJL, Armstrong N, Rasko JEJ, Schmitz U. Increased chromatin accessibility facilitates intron retention in specific cell differentiation states. Nucleic Acids Res 2022; 50:11563-11579. [PMID: 36354002 PMCID: PMC9723627 DOI: 10.1093/nar/gkac994] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/05/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022] Open
Abstract
Dynamic intron retention (IR) in vertebrate cells is of widespread biological importance. Aberrant IR is associated with numerous human diseases including several cancers. Despite consistent reports demonstrating that intrinsic sequence features can help introns evade splicing, conflicting findings about cell type- or condition-specific IR regulation by trans-regulatory and epigenetic mechanisms demand an unbiased and systematic analysis of IR in a controlled experimental setting. We integrated matched mRNA sequencing (mRNA-Seq), whole-genome bisulfite sequencing (WGBS), nucleosome occupancy methylome sequencing (NOMe-Seq) and chromatin immunoprecipitation sequencing (ChIP-Seq) data from primary human myeloid and lymphoid cells. Using these multi-omics data and machine learning, we trained two complementary models to determine the role of epigenetic factors in the regulation of IR in cells of the innate immune system. We show that increased chromatin accessibility, as revealed by nucleosome-free regions, contributes substantially to the retention of introns in a cell-specific manner. We also confirm that intrinsic characteristics of introns are key for them to evade splicing. This study suggests an important role for chromatin architecture in IR regulation. With an increasing appreciation that pathogenic alterations are linked to RNA processing, our findings may provide useful insights for the development of novel therapeutic approaches that target aberrant splicing.
Collapse
Affiliation(s)
- Veronika Petrova
- Computational BioMedicine Laboratory Centenary Institute, The University of Sydney, Camperdown 2050, Australia,Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown 2050, Australia
| | - Renhua Song
- Epigenetics and RNA Biology Program Centenary Institute, The University of Sydney, Camperdown 2050, Australia,Faculty of Medicine and Health, The University of Sydney, Camperdown 2050, Australia
| | | | - Karl J V Nordström
- Laboratory of EpiGenetics, Saarland University, Campus A2 4, D-66123 Saarbrücken, Germany
| | - Jörn Walter
- Laboratory of EpiGenetics, Saarland University, Campus A2 4, D-66123 Saarbrücken, Germany
| | - Justin J L Wong
- Epigenetics and RNA Biology Program Centenary Institute, The University of Sydney, Camperdown 2050, Australia,Faculty of Medicine and Health, The University of Sydney, Camperdown 2050, Australia
| | - Nicola J Armstrong
- Mathematics and Statistics, Curtin University, Bentley, WA 6102, Australia
| | | | | |
Collapse
|
27
|
Cortés-López M, Schulz L, Enculescu M, Paret C, Spiekermann B, Quesnel-Vallières M, Torres-Diz M, Unic S, Busch A, Orekhova A, Kuban M, Mesitov M, Mulorz MM, Shraim R, Kielisch F, Faber J, Barash Y, Thomas-Tikhonenko A, Zarnack K, Legewie S, König J. High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance. Nat Commun 2022; 13:5570. [PMID: 36138008 PMCID: PMC9500061 DOI: 10.1038/s41467-022-31818-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 07/05/2022] [Indexed: 11/29/2022] Open
Abstract
Following CART-19 immunotherapy for B-cell acute lymphoblastic leukaemia (B-ALL), many patients relapse due to loss of the cognate CD19 epitope. Since epitope loss can be caused by aberrant CD19 exon 2 processing, we herein investigate the regulatory code that controls CD19 splicing. We combine high-throughput mutagenesis with mathematical modelling to quantitatively disentangle the effects of all mutations in the region comprising CD19 exons 1-3. Thereupon, we identify ~200 single point mutations that alter CD19 splicing and thus could predispose B-ALL patients to developing CART-19 resistance. Furthermore, we report almost 100 previously unknown splice isoforms that emerge from cryptic splice sites and likely encode non-functional CD19 proteins. We further identify cis-regulatory elements and trans-acting RNA-binding proteins that control CD19 splicing (e.g., PTBP1 and SF3B4) and validate that loss of these factors leads to pervasive CD19 mis-splicing. Our dataset represents a comprehensive resource for identifying predictive biomarkers for CART-19 therapy. Multiple alternative splicing events in CD19 mRNA have been associated with resistance/relapse to CD19 CAR-T therapy in patients with B cell malignancies. Here, by combining patient data and a high-throughput mutagenesis screen, the authors identify single point mutations and RNA-binding proteins that can control CD19 splicing and be associated with CD19 CAR-T therapy resistance.
Collapse
Affiliation(s)
| | - Laura Schulz
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Mihaela Enculescu
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Claudia Paret
- Department of Pediatric Hematology/Oncology, Center for Pediatric and Adolescent Medicine, University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,University Cancer Center (UCT), University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,German Cancer Consortium (DKTK), site Frankfurt/Mainz, Germany, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Bea Spiekermann
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Mathieu Quesnel-Vallières
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA.,Department of Biochemistry and Biophysics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Manuel Torres-Diz
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Sebastian Unic
- Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany
| | - Anke Busch
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Anna Orekhova
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Monika Kuban
- Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany
| | - Mikhail Mesitov
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Miriam M Mulorz
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Rawan Shraim
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Fridolin Kielisch
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Jörg Faber
- Department of Pediatric Hematology/Oncology, Center for Pediatric and Adolescent Medicine, University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,University Cancer Center (UCT), University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,German Cancer Consortium (DKTK), site Frankfurt/Mainz, Germany, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Andrei Thomas-Tikhonenko
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Max-von-Laue-Str. 15, 60438, Frankfurt, Germany. .,Faculty Biological Sciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438, Frankfurt, Germany.
| | - Stefan Legewie
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany. .,Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany. .,Stuttgart Research Center for Systems Biology (SRCSB), University of Stuttgart, Stuttgart, Germany.
| | - Julian König
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
| |
Collapse
|
28
|
Li Q, Gloudemans MJ, Geisinger JM, Fan B, Aguet F, Sun T, Ramaswami G, Li YI, Ma JB, Pritchard JK, Montgomery SB, Li JB. RNA editing underlies genetic risk of common inflammatory diseases. Nature 2022; 608:569-577. [PMID: 35922514 PMCID: PMC9790998 DOI: 10.1038/s41586-022-05052-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 06/29/2022] [Indexed: 12/12/2022]
Abstract
A major challenge in human genetics is to identify the molecular mechanisms of trait-associated and disease-associated variants. To achieve this, quantitative trait locus (QTL) mapping of genetic variants with intermediate molecular phenotypes such as gene expression and splicing have been widely adopted1,2. However, despite successes, the molecular basis for a considerable fraction of trait-associated and disease-associated variants remains unclear3,4. Here we show that ADAR-mediated adenosine-to-inosine RNA editing, a post-transcriptional event vital for suppressing cellular double-stranded RNA (dsRNA)-mediated innate immune interferon responses5-11, is an important potential mechanism underlying genetic variants associated with common inflammatory diseases. We identified and characterized 30,319 cis-RNA editing QTLs (edQTLs) across 49 human tissues. These edQTLs were significantly enriched in genome-wide association study signals for autoimmune and immune-mediated diseases. Colocalization analysis of edQTLs with disease risk loci further pinpointed key, putatively immunogenic dsRNAs formed by expected inverted repeat Alu elements as well as unexpected, highly over-represented cis-natural antisense transcripts. Furthermore, inflammatory disease risk variants, in aggregate, were associated with reduced editing of nearby dsRNAs and induced interferon responses in inflammatory diseases. This unique directional effect agrees with the established mechanism that lack of RNA editing by ADAR1 leads to the specific activation of the dsRNA sensor MDA5 and subsequent interferon responses and inflammation7-9. Our findings implicate cellular dsRNA editing and sensing as a previously underappreciated mechanism of common inflammatory diseases.
Collapse
Affiliation(s)
- Qin Li
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Michael J. Gloudemans
- Department of Pathology, Stanford University, Stanford, CA, USA.,Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA
| | | | - Boming Fan
- State Key Laboratory of Genetic Engineering, Department of Biochemistry and Biophysics, School of Life Sciences, Fudan University, Shanghai, China
| | | | - Tao Sun
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Gokul Ramaswami
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Yang I. Li
- Department of Genetics, Stanford University, Stanford, CA, USA.,Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jin-Biao Ma
- State Key Laboratory of Genetic Engineering, Department of Biochemistry and Biophysics, School of Life Sciences, Fudan University, Shanghai, China
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA.,Department of Biology, Stanford University, Stanford, CA, USA
| | - Stephen B. Montgomery
- Department of Genetics, Stanford University, Stanford, CA, USA.,Department of Pathology, Stanford University, Stanford, CA, USA.,These authors contributed equally: Stephen B. Montgomery, Jin Billy Li
| | - Jin Billy Li
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
29
|
Wright CJ, Smith CWJ, Jiggins CD. Alternative splicing as a source of phenotypic diversity. Nat Rev Genet 2022; 23:697-710. [PMID: 35821097 DOI: 10.1038/s41576-022-00514-4] [Citation(s) in RCA: 132] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/13/2022] [Indexed: 12/27/2022]
Abstract
A major goal of evolutionary genetics is to understand the genetic processes that give rise to phenotypic diversity in multicellular organisms. Alternative splicing generates multiple transcripts from a single gene, enriching the diversity of proteins and phenotypic traits. It is well established that alternative splicing contributes to key innovations over long evolutionary timescales, such as brain development in bilaterians. However, recent developments in long-read sequencing and the generation of high-quality genome assemblies for diverse organisms has facilitated comparisons of splicing profiles between closely related species, providing insights into how alternative splicing evolves over shorter timescales. Although most splicing variants are probably non-functional, alternative splicing is nonetheless emerging as a dynamic, evolutionarily labile process that can facilitate adaptation and contribute to species divergence.
Collapse
Affiliation(s)
- Charlotte J Wright
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK. .,Department of Zoology, University of Cambridge, Cambridge, UK.
| | | | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
30
|
Zeng T, Li YI. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol 2022; 23:103. [PMID: 35449021 PMCID: PMC9022248 DOI: 10.1186/s13059-022-02664-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 04/04/2022] [Indexed: 11/26/2022] Open
Abstract
Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.
Collapse
Affiliation(s)
- Tony Zeng
- The College, University of Chicago, Chicago, 60637, IL, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, 60637, IL, USA.
| |
Collapse
|
31
|
Zou X, Schaefke B, Li Y, Jia F, Sun W, Li G, Liang W, Reif T, Heyd F, Gao Q, Tian S, Li Y, Tang Y, Fang L, Hu Y, Chen W. Mammalian splicing divergence is shaped by drift, buffering in trans, and a scaling law. Life Sci Alliance 2022; 5:5/4/e202101333. [PMID: 34969779 PMCID: PMC8739531 DOI: 10.26508/lsa.202101333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 12/20/2021] [Accepted: 12/20/2021] [Indexed: 11/24/2022] Open
Abstract
This study globally investigates the allelic splicing pattern in multiple tissues of an F1 hybrid mouse and reveals the underlying driving forces shaping such tissue-dependent splicing divergence. Alternative splicing is ubiquitous, but the mechanisms underlying its pattern of evolutionary divergence across mammalian tissues are still underexplored. Here, we investigated the cis-regulatory divergences and their relationship with tissue-dependent trans-regulation in multiple tissues of an F1 hybrid between two mouse species. Large splicing changes between tissues are highly conserved and likely reflect functional tissue-dependent regulation. In particular, micro-exons frequently exhibit this pattern with high inclusion levels in the brain. Cis-divergence of splicing appears to be largely non-adaptive. Although divergence is in general associated with higher densities of sequence variants in regulatory regions, events with high usage of the dominant isoform apparently tolerate more mutations, explaining why their exon sequences are highly conserved but their intronic splicing site flanking regions are not. Moreover, we demonstrate that non-adaptive mutations are often masked in tissues where accurate splicing likely is more important, and experimentally attribute such buffering effect to trans-regulatory splicing efficiency.
Collapse
Affiliation(s)
- Xudong Zou
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Bernhard Schaefke
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| | - Yisheng Li
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Fujian Jia
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Wei Sun
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Guipeng Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| | - Weizheng Liang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Tristan Reif
- Institute for Biochemistry, Freie Universität Berlin, Berlin, Germany
| | - Florian Heyd
- Institute for Biochemistry, Freie Universität Berlin, Berlin, Germany
| | - Qingsong Gao
- Laboratory for Systems Biology and Functional Genomics, Berlin Institute for Medical Systems Biology, Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany
| | - Shuye Tian
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Yanping Li
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Yisen Tang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Liang Fang
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| | - Yuhui Hu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Wei Chen
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China .,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China.,Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
| |
Collapse
|
32
|
A broad analysis of splicing regulation in yeast using a large library of synthetic introns. PLoS Genet 2021; 17:e1009805. [PMID: 34570750 PMCID: PMC8496845 DOI: 10.1371/journal.pgen.1009805] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 10/07/2021] [Accepted: 09/03/2021] [Indexed: 11/19/2022] Open
Abstract
RNA splicing is a key process in eukaryotic gene expression, in which an intron is spliced out of a pre-mRNA molecule to eventually produce a mature mRNA. Most intron-containing genes are constitutively spliced, hence efficient splicing of an intron is crucial for efficient regulation of gene expression. Here we use a large synthetic oligo library of ~20,000 variants to explore how different intronic sequence features affect splicing efficiency and mRNA expression levels in S. cerevisiae. Introns are defined by three functional sites, the 5’ donor site, the branch site, and the 3’ acceptor site. Using a combinatorial design of synthetic introns, we demonstrate how non-consensus splice site sequences in each of these sites affect splicing efficiency. We then show that S. cerevisiae splicing machinery tends to select alternative 3’ splice sites downstream of the original site, and we suggest that this tendency created a selective pressure, leading to the avoidance of cryptic splice site motifs near introns’ 3’ ends. We further use natural intronic sequences from other yeast species, whose splicing machineries have diverged to various extents, to show how intron architectures in the various species have been adapted to the organism’s splicing machinery. We suggest that the observed tendency for cryptic splicing is a result of a loss of a specific splicing factor, U2AF1. Lastly, we show that synthetic sequences containing two introns give rise to alternative RNA isoforms in S. cerevisiae, demonstrating that merely a synthetic fusion of two introns might be suffice to facilitate alternative splicing in yeast. Our study reveals novel mechanisms by which introns are shaped in evolution to allow cells to regulate their transcriptome. In addition, it provides a valuable resource to study the regulation of constitutive and alternative splicing in a model organism. RNA splicing is a process in which parts of a new pre-mRNA are spliced out of the mRNA molecule to produce eventually a mature mRNA. Those RNA segments that are spliced out are termed introns, and they are found in most genes in eukaryotic organisms. Hence regulation of this process has a major role in the control of gene expression. The budding yeast S. cerevisiae is a popular model organism for eukaryotic cell biology, but in terms of splicing it differs, as it has only few intron-containing genes. Nevertheless, this species has been used to study basic principles of splicing regulation based on its ~300 introns. Here we used the technology of a large synthetic genetic library to introduce many new intron-containing genes to the yeast genome, to explore splicing regulation at a wider scope than was possible so far. Reassuringly, our results confirm known regulatory mechanisms, and further expand our understanding of splicing regulation, specifically how the yeast splicing machinery interacts with the end of introns, and how through evolution introns have evolved to avoid unwanted misidentifications of this end. We further demonstrate the potential of the yeast splicing machinery to alternatively splice a two-intron gene, which is common in other eukaryotes but rare in yeast. Our work presents a first-of-its-kind resource for the systematic study of splicing in live cells.
Collapse
|
33
|
Nguyen LV, Caldas C. Functional genomics approaches to improve pre-clinical drug screening and biomarker discovery. EMBO Mol Med 2021; 13:e13189. [PMID: 34254730 PMCID: PMC8422077 DOI: 10.15252/emmm.202013189] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 05/23/2021] [Accepted: 06/10/2021] [Indexed: 12/13/2022] Open
Abstract
Advances in sequencing technology have enabled the genomic and transcriptomic characterization of human malignancies with unprecedented detail. However, this wealth of information has been slow to translate into clinically meaningful outcomes. Different models to study human cancers have been established and extensively characterized. Using these models, functional genomic screens and pre-clinical drug screening platforms have identified genetic dependencies that can be exploited with drug therapy. These genetic dependencies can also be used as biomarkers to predict response to treatment. For many cancers, the identification of such biomarkers remains elusive. In this review, we discuss the development and characterization of models used to study human cancers, RNA interference and CRISPR screens to identify genetic dependencies, large-scale pharmacogenomics studies and drug screening approaches to improve pre-clinical drug screening and biomarker discovery.
Collapse
Affiliation(s)
- Long V Nguyen
- Department of Oncology and Cancer Research UK Cambridge InstituteLi Ka Shing CentreUniversity of CambridgeCambridgeUK
- Cancer Research UK Cambridge Cancer CentreCambridgeUK
| | - Carlos Caldas
- Department of Oncology and Cancer Research UK Cambridge InstituteLi Ka Shing CentreUniversity of CambridgeCambridgeUK
- Cancer Research UK Cambridge Cancer CentreCambridgeUK
| |
Collapse
|
34
|
Saha K, Fernandez MM, Biswas T, Joseph S, Ghosh G. Discovery of a pre-mRNA structural scaffold as a contributor to the mammalian splicing code. Nucleic Acids Res 2021; 49:7103-7121. [PMID: 34161584 PMCID: PMC8266590 DOI: 10.1093/nar/gkab533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 06/03/2021] [Accepted: 06/08/2021] [Indexed: 11/13/2022] Open
Abstract
The specific recognition of splice signals at or near exon-intron junctions is not explained by their weak conservation and instead is postulated to require a multitude of features embedded in the pre-mRNA strand. We explored the possibility of 3D structural scaffold of AdML-a model pre-mRNA substrate-guiding early spliceosomal components to the splice signal sequences. We find that mutations in the non-cognate splice signal sequences impede recruitment of early spliceosomal components due to disruption of the global structure of the pre-mRNA. We further find that the pre-mRNA segments potentially interacting with the early spliceosomal component U1 snRNP are distributed across the intron, that there is a spatial proximity of 5' and 3' splice sites within the pre-mRNA scaffold, and that an interplay exists between the structural scaffold and splicing regulatory elements in recruiting early spliceosomal components. These results suggest that early spliceosomal components can recognize a 3D structural scaffold beyond the short splice signal sequences, and that in our model pre-mRNA, this scaffold is formed across the intron involving the major splice signals. This provides a conceptual basis to analyze the contribution of recognizable 3D structural scaffolds to the splicing code across the mammalian transcriptome.
Collapse
Affiliation(s)
- Kaushik Saha
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0375, USA
| | - Mike Minh Fernandez
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0375, USA
| | - Tapan Biswas
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0375, USA
| | - Simpson Joseph
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0375, USA
| | - Gourisankar Ghosh
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0375, USA
| |
Collapse
|
35
|
MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis. Nat Commun 2021; 12:3353. [PMID: 34099673 PMCID: PMC8184769 DOI: 10.1038/s41467-021-23608-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 05/07/2021] [Indexed: 11/09/2022] Open
Abstract
The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN's effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.
Collapse
|
36
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
37
|
Hotspot exons are common targets of splicing perturbations. Nat Commun 2021; 12:2756. [PMID: 33980843 PMCID: PMC8115636 DOI: 10.1038/s41467-021-22780-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 02/24/2021] [Indexed: 11/08/2022] Open
Abstract
High-throughput splicing assays have demonstrated that many exonic variants can disrupt splicing; however, splice-disrupting variants distribute non-uniformly across genes. We propose the existence of exons that are particularly susceptible to splice-disrupting variants, which we refer to as hotspot exons. Hotspot exons are also more susceptible to splicing perturbation through drug treatment and knock-down of RNA-binding proteins. We develop a classifier for exonic splice-disrupting variants and use it to infer hotspot exons. We estimate that 1400 exons in the human genome are hotspots. Using panels of splicing reporters, we demonstrate how the ability of an exon to tolerate a mutation is inversely proportional to the strength of its neighboring splice sites. Splicing-disrupting mutations are linked to diseases. By employing a machine learning approach, the authors show that certain exons, termed hotspot exons, are enriched for splicing-disruption variants and susceptible to splicing perturbations.
Collapse
|
38
|
Liu X, Sun T, Shcherbina A, Li Q, Jarmoskaite I, Kappel K, Ramaswami G, Das R, Kundaje A, Li JB. Learning cis-regulatory principles of ADAR-based RNA editing from CRISPR-mediated mutagenesis. Nat Commun 2021; 12:2165. [PMID: 33846332 PMCID: PMC8041805 DOI: 10.1038/s41467-021-22489-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 03/15/2021] [Indexed: 11/24/2022] Open
Abstract
Adenosine-to-inosine (A-to-I) RNA editing catalyzed by ADAR enzymes occurs in double-stranded RNAs. Despite a compelling need towards predictive understanding of natural and engineered editing events, how the RNA sequence and structure determine the editing efficiency and specificity (i.e., cis-regulation) is poorly understood. We apply a CRISPR/Cas9-mediated saturation mutagenesis approach to generate libraries of mutations near three natural editing substrates at their endogenous genomic loci. We use machine learning to integrate diverse RNA sequence and structure features to model editing levels measured by deep sequencing. We confirm known features and identify new features important for RNA editing. Training and testing XGBoost algorithm within the same substrate yield models that explain 68 to 86 percent of substrate-specific variation in editing levels. However, the models do not generalize across substrates, suggesting complex and context-dependent regulation patterns. Our integrative approach can be applied to larger scale experiments towards deciphering the RNA editing code.
Collapse
Affiliation(s)
- Xin Liu
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Tao Sun
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Anna Shcherbina
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Qin Li
- Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Kalli Kappel
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - Gokul Ramaswami
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University, Stanford, CA, USA
- Department of Physics, Stanford University, Stanford, CA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Jin Billy Li
- Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
39
|
Klaric L, Gisby JS, Papadaki A, Muckian MD, Macdonald-Dunlop E, Zhao JH, Tokolyi A, Persyn E, Pairo-Castineira E, Morris AP, Kalnapenkis A, Richmond A, Landini A, Hedman ÅK, Prins B, Zanetti D, Wheeler E, Kooperberg C, Yao C, Petrie JR, Fu J, Folkersen L, Walker M, Magnusson M, Eriksson N, Mattsson-Carlgren N, Timmers PRHJ, Hwang SJ, Enroth S, Gustafsson S, Vosa U, Chen Y, Siegbahn A, Reiner A, Johansson Å, Thorand B, Gigante B, Hayward C, Herder C, Gieger C, Langenberg C, Levy D, Zhernakova DV, Smith JG, Campbell H, Sundstrom J, Danesh J, Michaëlsson K, Suhre K, Lind L, Wallentin L, Padyukov L, Landén M, Wareham NJ, Göteson A, Hansson O, Eriksson P, Strawbridge RJ, Assimes TL, Esko T, Gyllensten U, Baillie JK, Paul DS, Joshi PK, Butterworth AS, Mälarstig A, Pirastu N, Wilson JF, Peters JE. Mendelian randomisation identifies alternative splicing of the FAS death receptor as a mediator of severe COVID-19. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.04.01.21254789. [PMID: 33851187 PMCID: PMC8043484 DOI: 10.1101/2021.04.01.21254789] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Severe COVID-19 is characterised by immunopathology and epithelial injury. Proteomic studies have identified circulating proteins that are biomarkers of severe COVID-19, but cannot distinguish correlation from causation. To address this, we performed Mendelian randomisation (MR) to identify proteins that mediate severe COVID-19. Using protein quantitative trait loci (pQTL) data from the SCALLOP consortium, involving meta-analysis of up to 26,494 individuals, and COVID-19 genome-wide association data from the Host Genetics Initiative, we performed MR for 157 COVID-19 severity protein biomarkers. We identified significant MR results for five proteins: FAS, TNFRSF10A, CCL2, EPHB4 and LGALS9. Further evaluation of these candidates using sensitivity analyses and colocalization testing provided strong evidence to implicate the apoptosis-associated cytokine receptor FAS as a causal mediator of severe COVID-19. This effect was specific to severe disease. Using RNA-seq data from 4,778 individuals, we demonstrate that the pQTL at the FAS locus results from genetically influenced alternate splicing causing skipping of exon 6. We show that the risk allele for very severe COVID-19 increases the proportion of transcripts lacking exon 6, and thereby increases soluble FAS. Soluble FAS acts as a decoy receptor for FAS-ligand, inhibiting apoptosis induced through membrane-bound FAS. In summary, we demonstrate a novel genetic mechanism that contributes to risk of severe of COVID-19, highlighting a pathway that may be a promising therapeutic target.
Collapse
Affiliation(s)
- Lucija Klaric
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| | - Jack S Gisby
- Department of Immunology and Inflammation, Faculty of Medicine, Imperial College London, London, UK
| | - Artemis Papadaki
- Department of Immunology and Inflammation, Faculty of Medicine, Imperial College London, London, UK
| | - Marisa D Muckian
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - Erin Macdonald-Dunlop
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - Jing Hua Zhao
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Alex Tokolyi
- Department of Human Genetics, Wellcome Sanger Institute, Hinxton, UK
| | - Elodie Persyn
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Erola Pairo-Castineira
- Roslin Institute, University of Edinburgh, Easter Bush, Edinburgh, UK
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| | - Andrew P Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, The University of Manchester, Manchester, UK
| | | | - Anne Richmond
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| | - Arianna Landini
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - Åsa K Hedman
- Department of Medicine, Karolinska Institute, Stockholm, Sweden
- Pfizer Worldwide Research, Development and Medical, Sweden
| | - Bram Prins
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Daniela Zanetti
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eleanor Wheeler
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Chen Yao
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | - John R Petrie
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
| | - Jingyuan Fu
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Department of Pediatrics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | | | - Mark Walker
- Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - Martin Magnusson
- Department of Clinical Sciences, Lund University, Malmö, Sweden
- Wallenberg Center for Molecular Medicine, Lund University, Sweden
- Hypertension in Africa Research Team (HART), North West University, Potchefstroom, South Africa
| | - Niclas Eriksson
- Uppsala Clinical Research Center (UCR), Uppsala University, Uppsala, Sweden
| | - Niklas Mattsson-Carlgren
- Clinical Memory Research Unit, Faculty of Medicine, Lund University, Lund, Sweden
- Department of Neurology, Skåne University Hospital, Lund University, Lund, Sweden
- Wallenberg Center for Molecular Medicine, Lund University, Sweden
| | - Paul R H J Timmers
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - Shih-Jen Hwang
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Stefan Enroth
- Department of Immunology, Genetics and Pathology, Uppsala University, Sweden
| | | | - Urmo Vosa
- Institute of Genomics, University of Tartu, 51010, Estonia
| | - Yan Chen
- Department of Medicine, Karolinska Institute, Stockholm, Sweden
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Agneta Siegbahn
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Alexander Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Åsa Johansson
- Department of Immunology, Genetics and Pathology, Uppsala University, Sweden
| | - Barbara Thorand
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, München-Neuherberg, Germany
- German Center for Diabetes Research (DZD), München-Neuherberg, Germany
| | - Bruna Gigante
- Division of Cardiovascular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Caroline Hayward
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| | - Christian Herder
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Division of Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- German Center for Diabetes Research (DZD), München-Neuherberg, Germany
| | - Christian Gieger
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, München-Neuherberg, Germany
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
- German Center for Diabetes Research (DZD), München-Neuherberg, Germany
| | - Claudia Langenberg
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
- Computational Medicine, Berlin Institute of Health (BIH) at Charité - Universitäts Medizin Berlin, Germany
- Health Data Research UK, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Daniel Levy
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Daria V Zhernakova
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- Laboratory of Genomic Diversity, Center for Computer Technologies, ITMO University, St. Petersburg, Russia
| | - J Gustav Smith
- Department of Cardiology, Clinical Sciences, Lund University
- Skåne University Hospital, Lund, Sweden
- Wallenberg Center for Molecular Medicine, Lund University, Sweden
- Lund University Diabetes Center, Lund University, Lund, Sweden
- The Wallenberg Laboratory/Department of Molecular and Clinical Medicine, Institute of Medicine, Gothenburg University
- Department of Cardiology, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Harry Campbell
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - Johan Sundstrom
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- The George Institute for Global Health, University of New South Wales, Sydney, Australia
| | - John Danesh
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Department of Human Genetics, Wellcome Sanger Institute, Hinxton, UK
| | - Karl Michaëlsson
- Department of Surgical Sciences, Unit of Medical Epidemiology, Uppsala University, Uppsala, Sweden
| | - Karsten Suhre
- Department of Physiology and Biophysics, Weill Cornell Medicine-Qatar, Doha, Qatar
| | - Lars Lind
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Lars Wallentin
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- Uppsala Clinical Research Center, Uppsala University, Uppsala, Sweden
| | - Leonid Padyukov
- Division of Rheumatology, Department of Medicine Solna, Karolinska Institutet, Sweden
- Karolinska University Hospital, Stockholm, Sweden
| | - Mikael Landén
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden
| | - Nicholas J Wareham
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
- Health Data Research UK, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Andreas Göteson
- Institute of Neuroscience and Physiology, University of Gothenburg, Gothenburg, Sweden
| | - Oskar Hansson
- Clinical Memory Research Unit, Faculty of Medicine, Lund University, Lund, Sweden
- Memory Clinic, Skåne University Hospital, Malmö, Sweden
| | - Per Eriksson
- Division of Cardiovascular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
- Karolinska University Hospital, Stockholm, Sweden
| | - Rona J Strawbridge
- Institute of Health and Wellbeing, College of Medicine, Veterinary and Life Sciences, University of Glasgow, UK
- Health Data Research UK, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Division of Cardiovascular Medicine, Department of Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Themistocles L Assimes
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Palo Alto VA Healthcare System, Palo Alto, CA, USA
| | - Tonu Esko
- Institute of Genomics, University of Tartu, 51010, Estonia
| | - Ulf Gyllensten
- Department of Immunology, Genetics and Pathology, Uppsala University, Sweden
| | - J Kenneth Baillie
- Intensive Care Unit, Royal Infirmary of Edinburgh, 54 Little France Drive, Edinburgh, EH16 5SA, UK
- Roslin Institute, University of Edinburgh, Easter Bush, Edinburgh, UK
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| | - Dirk S Paul
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, Addenbrookes Hospital, Cambridge, UK
- National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, United Kingdom
| | - Peter K Joshi
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - Adam S Butterworth
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, United Kingdom
| | - Anders Mälarstig
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Pfizer Worldwide Research, Development and Medical, Sweden
| | - Nicola Pirastu
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - James F Wilson
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, UK
| | - James E Peters
- Department of Immunology and Inflammation, Faculty of Medicine, Imperial College London, London, UK
- Health Data Research UK, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| |
Collapse
|
40
|
Grinberg NF, Wallace C. Multi-tissue transcriptome-wide association studies. Genet Epidemiol 2021; 45:324-337. [PMID: 33369784 PMCID: PMC8048510 DOI: 10.1002/gepi.22374] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 11/04/2020] [Accepted: 11/18/2020] [Indexed: 12/20/2022]
Abstract
A transcriptome-wide association study (TWAS) attempts to identify disease associated genes by imputing gene expression into a genome-wide association study (GWAS) using an expression quantitative trait loci (eQTL) data set and then testing for associations with a trait of interest. Regulatory processes may be shared across related tissues and one natural extension of TWAS is harnessing cross-tissue correlation in gene expression to improve prediction accuracy. Here, we studied multi-tissue extensions of lasso regression and random forests (RF), joint lasso and RF-MTL (multi-task learning RF), respectively. We found that, on our chosen eQTL data set, multi-tissue methods were generally more accurate than their single-tissue counterparts, with RF-MTL performing the best. Simulations showed that these benefits generally translated into more associated genes identified, although highlighted that joint lasso had a tendency to erroneously identify genes in one tissue if there existed an eQTL signal for that gene in another. Applying the four methods to a type 1 diabetes GWAS, we found that multi-tissue methods found more unique associated genes for most of the tissues considered. We conclude that multi-tissue methods are competitive and, for some cell types, superior to single-tissue approaches and hold much promise for TWAS studies.
Collapse
Affiliation(s)
- Nastasiya F. Grinberg
- Department of Medicine, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge Institute of Therapeutic Immunology and Infectious DiseaseUniversity of CambridgeCambridgeUK
| | - Chris Wallace
- Department of Medicine, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, Cambridge Institute of Therapeutic Immunology and Infectious DiseaseUniversity of CambridgeCambridgeUK
- MRC Biostatistics UnitUniversity of CambridgeCambridgeUK
| |
Collapse
|
41
|
Sruthi CK, Prakash MK. Disentangling the Contribution of Each Descriptive Characteristic of Every Single Mutation to Its Functional Effects. J Chem Inf Model 2021; 61:2090-2098. [PMID: 33754712 DOI: 10.1021/acs.jcim.0c01223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mutational effects predictions continue to improve in accuracy as advanced artificial intelligence (AI) algorithms are trained on exhaustive experimental data. The next natural questions to ask are if it is possible to gain insights into which attribute of the mutation contributes how much to the mutational effects and if one can develop universal rules for mapping the descriptors to mutational effects. In this work, we mainly address the former aspect using a framework of interpretable AI. Relations between the physicochemical descriptors and their contributions to the mutational effects are extracted by analyzing the data on 29,832 variants from eight systematic deep mutational scan studies. An opposite trend in the dependence of fitness and solubility on the distance of the amino acid from the catalytic sites could be extracted and quantified. The dependence of the mutational effect contributions on the position-specific scoring matrix (PSSM) score for the amino acid after mutation or the BLOSUM score of the substitution showed universal trends. Our attempts in the present work to explain the quantitative differences in the dependence on conservation and SASA across proteins were not successful. The work nevertheless brings transparency into the predictions and development of rules, and will hopefully lead to empirically uncovering the universalities among these rules.
Collapse
Affiliation(s)
- C K Sruthi
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| | - Meher K Prakash
- Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore 560064, India
| |
Collapse
|
42
|
Rentzsch P, Schubach M, Shendure J, Kircher M. CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 2021; 13:31. [PMID: 33618777 PMCID: PMC7901104 DOI: 10.1186/s13073-021-00835-9] [Citation(s) in RCA: 354] [Impact Index Per Article: 118.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 01/20/2021] [Indexed: 02/08/2023] Open
Abstract
Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-021-00835-9.
Collapse
Affiliation(s)
- Philipp Rentzsch
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Max Schubach
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| | - Jay Shendure
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Martin Kircher
- Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. .,Berlin Institute of Health (BIH), 10178, Berlin, Germany.
| |
Collapse
|
43
|
Liao SE, Regev O. Splicing at the phase-separated nuclear speckle interface: a model. Nucleic Acids Res 2021; 49:636-645. [PMID: 33337476 PMCID: PMC7826271 DOI: 10.1093/nar/gkaa1209] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/24/2020] [Accepted: 12/03/2020] [Indexed: 02/07/2023] Open
Abstract
Phase-separated membraneless bodies play important roles in nucleic acid biology. While current models for the roles of phase separation largely focus on the compartmentalization of constituent proteins, we reason that other properties of phase separation may play functional roles. Specifically, we propose that interfaces of phase-separated membraneless bodies could have functional roles in spatially organizing biochemical reactions. Here we propose such a model for the nuclear speckle, a membraneless body implicated in RNA splicing. In our model, sequence-dependent RNA positioning along the nuclear speckle interface coordinates RNA splicing. Our model asserts that exons are preferentially sequestered into nuclear speckles through binding by SR proteins, while introns are excluded through binding by nucleoplasmic hnRNP proteins. As a result, splice sites at exon-intron boundaries are preferentially positioned at nuclear speckle interfaces. This positioning exposes splice sites to interface-localized spliceosomes, enabling the subsequent splicing reaction. Our model provides a simple mechanism that seamlessly explains much of the complex logic of splicing. This logic includes experimental results such as the antagonistic duality between splicing factors, the position dependence of splicing sequence motifs, and the collective contribution of many motifs to splicing decisions. Similar functional roles for phase-separated interfaces may exist for other membraneless bodies.
Collapse
Affiliation(s)
- Susan E Liao
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Oded Regev
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| |
Collapse
|
44
|
Schrode N, Seah C, Deans PJM, Hoffman G, Brennand KJ. Analysis framework and experimental design for evaluating synergy-driving gene expression. Nat Protoc 2021; 16:812-840. [PMID: 33432232 PMCID: PMC8609447 DOI: 10.1038/s41596-020-00436-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 10/07/2020] [Indexed: 01/29/2023]
Abstract
The mechanisms by which genetic risk variants interact with each other, as well as environmental factors, to contribute to complex genetic disorders remain unclear. We describe in detail our recently published approach to resolve distinct additive and synergistic transcriptomic effects after combinatorial manipulation of genetic variants and/or chemical perturbagens. Although first developed for CRISPR-based perturbation studies of isogenic human induced pluripotent stem cell-derived neurons, our methodology can be broadly applied to any RNA sequencing dataset, provided that raw read counts are available. Whereas other differential expression analyses reveal the effect of individual perturbations, here we specifically query interactions between two or more perturbagens, resolving the extent of non-additive (synergistic) interactions between perturbations. We discuss the careful experimental design required to resolve synergistic effects and considerations of statistical power and how to quantify observed synergy between experiments. Additionally, we speculate on potential future applications and explore the obvious limitations of this approach. Overall, by interrogating the effect of independent factors, alone and in combination, our analytic framework and experimental design facilitate the discovery of convergence and synergy downstream of gene and/or treatment perturbations hypothesized to contribute to complex diseases. We think that this protocol can be successfully applied by any scientist with bioinformatic skills and basic proficiency in the R programming language. Our computational pipeline ( https://github.com/nadschro/synergy-analysis ) is straightforward, does not require supercomputing support and can be conducted in a single day upon completion of RNA sequencing experiments.
Collapse
Affiliation(s)
- Nadine Schrode
- Department of Genetics and Genomics, Pamela Sklar Division of Psychiatric Genomics, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Carina Seah
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - PJ Michael Deans
- Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Gabriel Hoffman
- Department of Genetics and Genomics, Pamela Sklar Division of Psychiatric Genomics, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029,Correspondence: and
| | - Kristen J. Brennand
- Department of Genetics and Genomics, Pamela Sklar Division of Psychiatric Genomics, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029,Graduate School of Biomedical Science, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA,Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA,Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA,Correspondence: and
| |
Collapse
|
45
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
46
|
Schmitz U, Monteuuis G, Petrova V, Shah JS, Rasko JE. Computational Methods for Intron Retention Identification and Quantification. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11567-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
47
|
Kinsler G, Geiler-Samerotte K, Petrov DA. Fitness variation across subtle environmental perturbations reveals local modularity and global pleiotropy of adaptation. eLife 2020; 9:e61271. [PMID: 33263280 PMCID: PMC7880691 DOI: 10.7554/elife.61271] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 12/02/2020] [Indexed: 02/07/2023] Open
Abstract
Building a genotype-phenotype-fitness map of adaptation is a central goal in evolutionary biology. It is difficult even when adaptive mutations are known because it is hard to enumerate which phenotypes make these mutations adaptive. We address this problem by first quantifying how the fitness of hundreds of adaptive yeast mutants responds to subtle environmental shifts. We then model the number of phenotypes these mutations collectively influence by decomposing these patterns of fitness variation. We find that a small number of inferred phenotypes can predict fitness of the adaptive mutations near their original glucose-limited evolution condition. Importantly, inferred phenotypes that matter little to fitness at or near the evolution condition can matter strongly in distant environments. This suggests that adaptive mutations are locally modular - affecting a small number of phenotypes that matter to fitness in the environment where they evolved - yet globally pleiotropic - affecting additional phenotypes that may reduce or improve fitness in new environments.
Collapse
Affiliation(s)
- Grant Kinsler
- Department of Biology, Stanford UniversityStanfordUnited States
| | - Kerry Geiler-Samerotte
- Department of Biology, Stanford UniversityStanfordUnited States
- Center for Mechanisms of Evolution, School of Life Sciences, Arizona State UniversityTempeUnited States
| | - Dmitri A Petrov
- Department of Biology, Stanford UniversityStanfordUnited States
| |
Collapse
|
48
|
Koterniak B, Pilaka PP, Gracida X, Schneider LM, Pritišanac I, Zhang Y, Calarco JA. Global regulatory features of alternative splicing across tissues and within the nervous system of C. elegans. Genome Res 2020; 30:1766-1780. [PMID: 33127752 PMCID: PMC7706725 DOI: 10.1101/gr.267328.120] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/28/2020] [Indexed: 12/27/2022]
Abstract
Alternative splicing plays a major role in shaping tissue-specific transcriptomes. Among the broad tissue types present in metazoans, the central nervous system contains some of the highest levels of alternative splicing. Although many documented examples of splicing differences between broad tissue types exist, there remains much to be understood about the splicing factors and the cis sequence elements controlling tissue and neuron subtype-specific splicing patterns. By using translating ribosome affinity purification coupled with deep-sequencing (TRAP-seq) in Caenorhabditis elegans, we have obtained high coverage profiles of ribosome-associated mRNA for three broad tissue classes (nervous system, muscle, and intestine) and two neuronal subtypes (dopaminergic and serotonergic neurons). We have identified hundreds of splice junctions that exhibit distinct splicing patterns between tissue types or within the nervous system. Alternative splicing events differentially regulated between tissues are more often frame-preserving, are more highly conserved across Caenorhabditis species, and are enriched in specific cis regulatory motifs, when compared with other types of exons. By using this information, we have identified a likely mechanism of splicing repression by the RNA-binding protein UNC-75/CELF via interactions with cis elements that overlap a 5′ splice site. Alternatively spliced exons also overlap more frequently with intrinsically disordered peptide regions than constitutive exons. Moreover, regulated exons are often shorter than constitutive exons but are flanked by longer intron sequences. Among these tissue-regulated exons are several highly conserved microexons <27 nt in length. Collectively, our results indicate a rich layer of tissue-specific gene regulation at the level of alternative splicing in C. elegans that parallels the evolutionary forces and constraints observed across metazoa.
Collapse
Affiliation(s)
- Bina Koterniak
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada
| | - Pallavi P Pilaka
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada
| | - Xicotencatl Gracida
- Department of Organismal and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Lisa-Marie Schneider
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada.,Department of Chemistry, University of Bayreuth, 95447 Bayreuth, Germany
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada.,Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Yun Zhang
- Department of Organismal and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - John A Calarco
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario M5S 3G5, Canada
| |
Collapse
|
49
|
Baeza-Centurion P, Miñana B, Valcárcel J, Lehner B. Mutations primarily alter the inclusion of alternatively spliced exons. eLife 2020; 9:59959. [PMID: 33112234 PMCID: PMC7673789 DOI: 10.7554/elife.59959] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/27/2020] [Indexed: 12/17/2022] Open
Abstract
Genetic analyses and systematic mutagenesis have revealed that synonymous, non-synonymous and intronic mutations frequently alter the inclusion levels of alternatively spliced exons, consistent with the concept that altered splicing might be a common mechanism by which mutations cause disease. However, most exons expressed in any cell are highly-included in mature mRNAs. Here, by performing deep mutagenesis of highly-included exons and by analysing the association between genome sequence variation and exon inclusion across the transcriptome, we report that mutations only very rarely alter the inclusion of highly-included exons. This is true for both exonic and intronic mutations as well as for perturbations in trans. Therefore, mutations that affect splicing are not evenly distributed across primary transcripts but are focussed in and around alternatively spliced exons with intermediate inclusion levels. These results provide a resource for prioritising synonymous and other variants as disease-causing mutations.
Collapse
Affiliation(s)
- Pablo Baeza-Centurion
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Belén Miñana
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Juan Valcárcel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
50
|
Goh YT, Koh CWQ, Sim DY, Roca X, Goh WSS. METTL4 catalyzes m6Am methylation in U2 snRNA to regulate pre-mRNA splicing. Nucleic Acids Res 2020; 48:9250-9261. [PMID: 32813009 PMCID: PMC7498333 DOI: 10.1093/nar/gkaa684] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 07/23/2020] [Accepted: 08/04/2020] [Indexed: 01/06/2023] Open
Abstract
N6-methylation of 2′-O-methyladenosine (Am) in RNA occurs in eukaryotic cells to generate N6,2′-O-dimethyladenosine (m6Am). Identification of the methyltransferase responsible for m6Am catalysis has accelerated studies on the function of m6Am in RNA processing. While m6Am is generally found in the first transcribed nucleotide of mRNAs, the modification is also found internally within U2 snRNA. However, the writer required for catalyzing internal m6Am formation had remained elusive. By sequencing transcriptome-wide RNA methylation at single-base-resolution, we identified human METTL4 as the writer that directly methylates Am at U2 snRNA position 30 into m6Am. We found that METTL4 localizes to the nucleus and its conserved methyltransferase catalytic site is required for U2 snRNA methylation. By sequencing human cells with overexpressed Mettl4, we determined METTL4’s in vivo target RNA motif specificity. In the absence of Mettl4 in human cells, U2 snRNA lacks m6Am thereby affecting a subset of splicing events that exhibit specific features such as 3′ splice-site weakness and an increase in exon inclusion. These findings suggest that METTL4 methylation of U2 snRNA regulates splicing of specific pre-mRNA transcripts.
Collapse
Affiliation(s)
- Yeek Teck Goh
- Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore
| | - Casslynn W Q Koh
- Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore
| | - Donald Yuhui Sim
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Xavier Roca
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - W S Sho Goh
- Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore
| |
Collapse
|