1
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
2
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2024:10.1038/s41576-024-00774-2. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
3
|
Posfai A, Zhou J, McCandlish DM, Kinney JB. Gauge fixing for sequence-function relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.12.593772. [PMID: 38798671 PMCID: PMC11118547 DOI: 10.1101/2024.05.12.593772] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
Collapse
Affiliation(s)
- Anna Posfai
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - David M. McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Justin B. Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
4
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
5
|
Derbel H, Zhao Z, Liu Q. Accurate prediction of functional effect of single amino acid variants with deep learning. Comput Struct Biotechnol J 2023; 21:5776-5784. [PMID: 38074467 PMCID: PMC10709104 DOI: 10.1016/j.csbj.2023.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 11/08/2023] [Accepted: 11/09/2023] [Indexed: 02/12/2024] Open
Abstract
The assessment of functional effect of amino acid variants is a critical biological problem in proteomics for clinical medicine and protein engineering. Although natively occurring variants offer insights into deleterious variants, high-throughput deep mutational experiments enable comprehensive investigation of amino acid variants for a given protein. However, these mutational experiments are too expensive to dissect millions of variants on thousands of proteins. Thus, computational approaches have been proposed, but they heavily rely on hand-crafted evolutionary conservation, limiting their accuracy. Recent advancement in transformers provides a promising solution to precisely estimate the functional effects of protein variants on high-throughput experimental data. Here, we introduce a novel deep learning model, namely Rep2Mut-V2, which leverages learned representation from transformer models. Rep2Mut-V2 significantly enhances the prediction accuracy for 27 types of measurements of functional effects of protein variants. In the evaluation of 38 protein datasets with 118,933 single amino acid variants, Rep2Mut-V2 achieved an average Spearman's correlation coefficient of 0.7. This surpasses the performance of six state-of-the-art methods, including the recently released methods ESM, DeepSequence and EVE. Even with limited training data, Rep2Mut-V2 outperforms ESM and DeepSequence, showing its potential to extend high-throughput experimental analysis for more protein variants to reduce experimental cost. In conclusion, Rep2Mut-V2 provides accurate predictions of the functional effects of single amino acid variants of protein coding sequences. This tool can significantly aid in the interpretation of variants in human disease studies.
Collapse
Affiliation(s)
- Houssemeddine Derbel
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Zhongming Zhao
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Qian Liu
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- School of Life Sciences, College of Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| |
Collapse
|
6
|
Liao SE, Sudarshan M, Regev O. Deciphering RNA splicing logic with interpretable machine learning. Proc Natl Acad Sci U S A 2023; 120:e2221165120. [PMID: 37796983 PMCID: PMC10576025 DOI: 10.1073/pnas.2221165120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/29/2023] [Indexed: 10/07/2023] Open
Abstract
Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: Despite their excellent accuracy, they cannot describe how they arrived at their predictions. Here, using an "interpretable-by-design" approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed uncharacterized components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Collapse
Affiliation(s)
- Susan E. Liao
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Oded Regev
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| |
Collapse
|
7
|
Wang R, Helbig I, Edmondson AC, Lin L, Xing Y. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open
Abstract
Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients.
Collapse
Affiliation(s)
- Robert Wang
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew C Edmondson
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
8
|
Baier F, Gauye F, Perez-Carrasco R, Payne JL, Schaerli Y. Environment-dependent epistasis increases phenotypic diversity in gene regulatory networks. SCIENCE ADVANCES 2023; 9:eadf1773. [PMID: 37224262 DOI: 10.1126/sciadv.adf1773] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 04/17/2023] [Indexed: 05/26/2023]
Abstract
Mutations to gene regulatory networks can be maladaptive or a source of evolutionary novelty. Epistasis confounds our understanding of how mutations affect the expression patterns of gene regulatory networks, a challenge exacerbated by the dependence of epistasis on the environment. We used the toolkit of synthetic biology to systematically assay the effects of pairwise and triplet combinations of mutant genotypes on the expression pattern of a gene regulatory network expressed in Escherichia coli that interprets an inducer gradient across a spatial domain. We uncovered a preponderance of epistasis that can switch in magnitude and sign across the inducer gradient to produce a greater diversity of expression pattern phenotypes than would be possible in the absence of such environment-dependent epistasis. We discuss our findings in the context of the evolution of hybrid incompatibilities and evolutionary novelties.
Collapse
Affiliation(s)
- Florian Baier
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| | - Florence Gauye
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| | | | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
| | - Yolanda Schaerli
- Department of Fundamental Microbiology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| |
Collapse
|
9
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
10
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
11
|
Rogalska ME, Vivori C, Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat Rev Genet 2023; 24:251-269. [PMID: 36526860 DOI: 10.1038/s41576-022-00556-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2022] [Indexed: 12/23/2022]
Abstract
The removal of introns from mRNA precursors and its regulation by alternative splicing are key for eukaryotic gene expression and cellular function, as evidenced by the numerous pathologies induced or modified by splicing alterations. Major recent advances have been made in understanding the structures and functions of the splicing machinery, in the description and classification of physiological and pathological isoforms and in the development of the first therapies for genetic diseases based on modulation of splicing. Here, we review this progress and discuss important remaining challenges, including predicting splice sites from genomic sequences, understanding the variety of molecular mechanisms and logic of splicing regulation, and harnessing this knowledge for probing gene function and disease aetiology and for the design of novel therapeutic approaches.
Collapse
Affiliation(s)
- Malgorzata Ewa Rogalska
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Vivori
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- The Francis Crick Institute, London, UK
| | - Juan Valcárcel
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
12
|
Tabet D, Parikh V, Mali P, Roth FP, Claussnitzer M. Scalable Functional Assays for the Interpretation of Human Genetic Variation. Annu Rev Genet 2022; 56:441-465. [PMID: 36055970 DOI: 10.1146/annurev-genet-072920-032107] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Scalable sequence-function studies have enabled the systematic analysis and cataloging of hundreds of thousands of coding and noncoding genetic variants in the human genome. This has improved clinical variant interpretation and provided insights into the molecular, biophysical, and cellular effects of genetic variants at an astonishing scale and resolution across the spectrum of allele frequencies. In this review, we explore current applications and prospects for the field and outline the principles underlying scalable functional assay design, with a focus on the study of single-nucleotide coding and noncoding variants.
Collapse
Affiliation(s)
- Daniel Tabet
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Victoria Parikh
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Frederick P Roth
- Donnelly Centre, Department of Molecular Genetics, and Department of Computer Science, University of Toronto, Toronto, Ontario, Canada;
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Melina Claussnitzer
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Genomic Medicine and Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Harvard University, Boston, Massachusetts, USA;
| |
Collapse
|
13
|
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 2022; 31:R123-R136. [PMID: 35960994 PMCID: PMC9585682 DOI: 10.1093/hmg/ddac196] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 02/04/2023] Open
Abstract
Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.
Collapse
Affiliation(s)
- Peter J Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of General Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Abdullah Abood
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Charles R Farber
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Gloria M Sheynkman
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
14
|
Abstract
One core goal of genetics is to systematically understand the mapping between the DNA sequence of an organism (genotype) and its measurable characteristics (phenotype). Understanding this mapping is often challenging because of interactions between mutations, where the result of combining several different mutations can be very different than the sum of their individual effects. Here we provide a statistical framework for modeling complex genetic interactions of this type. The key idea is to ask how fast the effects of mutations change when introducing the same mutation in increasingly distant genetic backgrounds. We then propose a model for phenotypic prediction that takes into account this tendency for the effects of mutations to be more similar in nearby genetic backgrounds. Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype–phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype–phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA 5′ splice sites, for which we also validate our model predictions via additional low-throughput experiments.
Collapse
|
15
|
Cortés-López M, Schulz L, Enculescu M, Paret C, Spiekermann B, Quesnel-Vallières M, Torres-Diz M, Unic S, Busch A, Orekhova A, Kuban M, Mesitov M, Mulorz MM, Shraim R, Kielisch F, Faber J, Barash Y, Thomas-Tikhonenko A, Zarnack K, Legewie S, König J. High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance. Nat Commun 2022; 13:5570. [PMID: 36138008 PMCID: PMC9500061 DOI: 10.1038/s41467-022-31818-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 07/05/2022] [Indexed: 11/29/2022] Open
Abstract
Following CART-19 immunotherapy for B-cell acute lymphoblastic leukaemia (B-ALL), many patients relapse due to loss of the cognate CD19 epitope. Since epitope loss can be caused by aberrant CD19 exon 2 processing, we herein investigate the regulatory code that controls CD19 splicing. We combine high-throughput mutagenesis with mathematical modelling to quantitatively disentangle the effects of all mutations in the region comprising CD19 exons 1-3. Thereupon, we identify ~200 single point mutations that alter CD19 splicing and thus could predispose B-ALL patients to developing CART-19 resistance. Furthermore, we report almost 100 previously unknown splice isoforms that emerge from cryptic splice sites and likely encode non-functional CD19 proteins. We further identify cis-regulatory elements and trans-acting RNA-binding proteins that control CD19 splicing (e.g., PTBP1 and SF3B4) and validate that loss of these factors leads to pervasive CD19 mis-splicing. Our dataset represents a comprehensive resource for identifying predictive biomarkers for CART-19 therapy. Multiple alternative splicing events in CD19 mRNA have been associated with resistance/relapse to CD19 CAR-T therapy in patients with B cell malignancies. Here, by combining patient data and a high-throughput mutagenesis screen, the authors identify single point mutations and RNA-binding proteins that can control CD19 splicing and be associated with CD19 CAR-T therapy resistance.
Collapse
Affiliation(s)
| | - Laura Schulz
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Mihaela Enculescu
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Claudia Paret
- Department of Pediatric Hematology/Oncology, Center for Pediatric and Adolescent Medicine, University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,University Cancer Center (UCT), University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,German Cancer Consortium (DKTK), site Frankfurt/Mainz, Germany, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Bea Spiekermann
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Mathieu Quesnel-Vallières
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA.,Department of Biochemistry and Biophysics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Manuel Torres-Diz
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Sebastian Unic
- Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany
| | - Anke Busch
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Anna Orekhova
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Monika Kuban
- Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany
| | - Mikhail Mesitov
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Miriam M Mulorz
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Rawan Shraim
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
| | - Fridolin Kielisch
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Jörg Faber
- Department of Pediatric Hematology/Oncology, Center for Pediatric and Adolescent Medicine, University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,University Cancer Center (UCT), University Medical Center of the Johannes Gutenberg University Mainz, 55131, Mainz, Germany.,German Cancer Consortium (DKTK), site Frankfurt/Mainz, Germany, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Andrei Thomas-Tikhonenko
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Max-von-Laue-Str. 15, 60438, Frankfurt, Germany. .,Faculty Biological Sciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438, Frankfurt, Germany.
| | - Stefan Legewie
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany. .,Department of Systems Biology, Institute for Biomedical Genetics (IBMG), University of Stuttgart, Allmandring 30E, 70569, Stuttgart, Germany. .,Stuttgart Research Center for Systems Biology (SRCSB), University of Stuttgart, Stuttgart, Germany.
| | - Julian König
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
| |
Collapse
|
16
|
Srivastava M, Payne JL. On the incongruence of genotype-phenotype and fitness landscapes. PLoS Comput Biol 2022; 18:e1010524. [PMID: 36121840 PMCID: PMC9521842 DOI: 10.1371/journal.pcbi.1010524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 09/29/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open
Abstract
The mapping from genotype to phenotype to fitness typically involves multiple nonlinearities that can transform the effects of mutations. For example, mutations may contribute additively to a phenotype, but their effects on fitness may combine non-additively because selection favors a low or intermediate value of that phenotype. This can cause incongruence between the topographical properties of a fitness landscape and its underlying genotype-phenotype landscape. Yet, genotype-phenotype landscapes are often used as a proxy for fitness landscapes to study the dynamics and predictability of evolution. Here, we use theoretical models and empirical data on transcription factor-DNA interactions to systematically study the incongruence of genotype-phenotype and fitness landscapes when selection favors a low or intermediate phenotypic value. Using the theoretical models, we prove a number of fundamental results. For example, selection for low or intermediate phenotypic values does not change simple sign epistasis into reciprocal sign epistasis, implying that genotype-phenotype landscapes with only simple sign epistasis motifs will always give rise to single-peaked fitness landscapes under such selection. More broadly, we show that such selection tends to create fitness landscapes that are more rugged than the underlying genotype-phenotype landscape, but this increased ruggedness typically does not frustrate adaptive evolution because the local adaptive peaks in the fitness landscape tend to be nearly as tall as the global peak. Many of these results carry forward to the empirical genotype-phenotype landscapes, which may help to explain why low- and intermediate-affinity transcription factor-DNA interactions are so prevalent in eukaryotic gene regulation.
Collapse
Affiliation(s)
- Malvika Srivastava
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
17
|
Zeng T, Li YI. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol 2022; 23:103. [PMID: 35449021 PMCID: PMC9022248 DOI: 10.1186/s13059-022-02664-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 04/04/2022] [Indexed: 11/26/2022] Open
Abstract
Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.
Collapse
Affiliation(s)
- Tony Zeng
- The College, University of Chicago, Chicago, 60637, IL, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, 60637, IL, USA.
| |
Collapse
|
18
|
SRSF6 Regulates the Alternative Splicing of the Apoptotic Fas Gene by Targeting a Novel RNA Sequence. Cancers (Basel) 2022; 14:cancers14081990. [PMID: 35454897 PMCID: PMC9025165 DOI: 10.3390/cancers14081990] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 12/20/2022] Open
Abstract
Simple Summary Alternative splicing (AS) produces multiple mRNA isoforms from a gene to make a large number of proteins. Fas (Apo-1/CD95) pre-mRNA, a member of TNF receptor family that mediates apoptosis, can generate pro-apoptotic and anti-apoptotic proteins through AS. Here, we identified SRSF6 as an essential regulator protein in Fas AS. We further located a new functional target sequence of SRSF6 in Fas splicing. In addition, our large-scale RNA-seq analysis using GTEX and TCGA indicated that while SRSF6 expression was correlated with Fas expression in normal tissues, the correlation was disrupted in tumors. Our results suggest a novel regulatory mechanisms of Fas AS. Abstract Alternative splicing (AS) is a procedure during gene expression that allows the production of multiple mRNAs from a single gene, leading to a larger number of proteins with various functions. The alternative splicing (AS) of Fas (Apo-1/CD95) pre-mRNA can generate membrane-bound or soluble isoforms with pro-apoptotic and anti-apoptotic functions. SRSF6, a member of the Serine/Arginine-rich protein family, plays essential roles in both constitutive and alternative splicing. Here, we identified SRSF6 as an important regulatory protein in Fas AS. The cassette exon inclusion of Fas was decreased by SRSF6-targeting shRNA treatment, but increased by SRSF6 overexpression. The deletion and substitution mutagenesis of the Fas minigene demonstrated that the UGCCAA sequence in the cassette exon of the Fas gene causes the functional disruption of SRSF6, indicating that these sequences are essential for SRSF6 function in Fas splicing. In addition, biotin-labeled RNA-pulldown and immunoblotting analysis showed that SRSF6 interacted with these RNA sequences. Mutagenesis in the splice-site strength alteration demonstrated that the 5′ splice-site, but not the 3′ splice-site, was required for the SRSF6 regulation of Fas pre-mRNA. In addition, a large-scale RNA-seq analysis using GTEX and TCGA indicated that while SRSF6 expression was correlated with Fas expression in normal tissues, the correlation was disrupted in tumors. Furthermore, high SRSF6 expression was linked to the high expression of pro-apoptotic and immune activation genes. Therefore, we identified a novel RNA target with 5′ splice-site dependence of SRSF6 in Fas pre-mRNA splicing, and a correlation between SRSF6 and Fas expression.
Collapse
|
19
|
Li C, Haller G, Weihl CC. Current and Future Approaches to Classify VUSs in LGMD-Related Genes. Genes (Basel) 2022; 13:genes13020382. [PMID: 35205425 PMCID: PMC8871643 DOI: 10.3390/genes13020382] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 02/11/2022] [Accepted: 02/16/2022] [Indexed: 01/09/2023] Open
Abstract
Next-generation sequencing (NGS) has revealed large numbers of genetic variants in LGMD-related genes, with most of them classified as variants of uncertain significance (VUSs). VUSs are genetic changes with unknown pathological impact and present a major challenge in genetic test interpretation and disease diagnosis. Understanding the phenotypic consequences of VUSs can provide clinical guidance regarding LGMD risk and therapy. In this review, we provide a brief overview of the subtypes of LGMD, disease diagnosis, current classification systems for investigating VUSs, and a potential deep mutational scanning approach to classify VUSs in LGMD-related genes.
Collapse
Affiliation(s)
- Chengcheng Li
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO 63110, USA; (C.L.); (G.H.)
| | - Gabe Haller
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO 63110, USA; (C.L.); (G.H.)
- Department of Neurological Surgery, Washington University School of Medicine, Saint Louis, MO 63110, USA
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Conrad C. Weihl
- Department of Neurology, Washington University School of Medicine, Saint Louis, MO 63110, USA; (C.L.); (G.H.)
- Correspondence:
| |
Collapse
|
20
|
Sesta L, Uguzzoni G, Fernandez-de-Cossio-Diaz J, Pagnani A. AMaLa: Analysis of Directed Evolution Experiments via Annealed Mutational Approximated Landscape. Int J Mol Sci 2021; 22:10908. [PMID: 34681569 PMCID: PMC8535593 DOI: 10.3390/ijms222010908] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 09/24/2021] [Accepted: 09/27/2021] [Indexed: 01/12/2023] Open
Abstract
We present Annealed Mutational approximated Landscape (AMaLa), a new method to infer fitness landscapes from Directed Evolution experiments sequencing data. Such experiments typically start from a single wild-type sequence, which undergoes Darwinian in vitro evolution via multiple rounds of mutation and selection for a target phenotype. In the last years, Directed Evolution is emerging as a powerful instrument to probe fitness landscapes under controlled experimental conditions and as a relevant testing ground to develop accurate statistical models and inference algorithms (thanks to high-throughput screening and sequencing). Fitness landscape modeling either uses the enrichment of variants abundances as input, thus requiring the observation of the same variants at different rounds or assuming the last sequenced round as being sampled from an equilibrium distribution. AMaLa aims at effectively leveraging the information encoded in the whole time evolution. To do so, while assuming statistical sampling independence between sequenced rounds, the possible trajectories in sequence space are gauged with a time-dependent statistical weight consisting of two contributions: (i) an energy term accounting for the selection process and (ii) a generalized Jukes-Cantor model for the purely mutational step. This simple scheme enables accurately describing the Directed Evolution dynamics and inferring a fitness landscape that correctly reproduces the measures of the phenotype under selection (e.g., antibiotic drug resistance), notably outperforming widely used inference strategies. In addition, we assess the reliability of AMaLa by showing how the inferred statistical model could be used to predict relevant structural properties of the wild-type sequence.
Collapse
Affiliation(s)
- Luca Sesta
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Guido Uguzzoni
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 & PSL Research, Sorbonne Université, 24 rue Lhomond, 75005 Paris, France
- Center of Molecular Immunology, Systems Biology Department, Playa, Havana CP 11600, Cuba
| | - Andrea Pagnani
- Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy; (L.S.); (G.U.); (A.P.)
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo, Italy
- INFN, Sezione di Torino, I-10125 Torino, Italy
| |
Collapse
|
21
|
Findlay GM. Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet 2021; 30:R187-R197. [PMID: 34338757 PMCID: PMC8490018 DOI: 10.1093/hmg/ddab219] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
The application of genomics to medicine has accelerated the discovery of mutations underlying disease and has enhanced our knowledge of the molecular underpinnings of diverse pathologies. As the amount of human genetic material queried via sequencing has grown exponentially in recent years, so too has the number of rare variants observed. Despite progress, our ability to distinguish which rare variants have clinical significance remains limited. Over the last decade, however, powerful experimental approaches have emerged to characterize variant effects orders of magnitude faster than before. Fueled by improved DNA synthesis and sequencing and, more recently, by CRISPR/Cas9 genome editing, multiplex functional assays provide a means of generating variant effect data in wide-ranging experimental systems. Here, I review recent applications of multiplex assays that link human variants to disease phenotypes and I describe emerging strategies that will enhance their clinical utility in coming years.
Collapse
Affiliation(s)
- Gregory M Findlay
- The Francis Crick Institute, The Genome Function Laboratory, London NW1 1AT, UK
| |
Collapse
|
22
|
Smith CCR, Rieseberg LH, Hulke BS, Kane NC. Aberrant RNA splicing due to genetic incompatibilities in sunflower hybrids. Evolution 2021; 75:2747-2758. [PMID: 34533836 DOI: 10.1111/evo.14360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 06/27/2021] [Accepted: 09/01/2021] [Indexed: 01/18/2023]
Abstract
Genome-scale studies have revealed divergent mRNA splicing patterns between closely related species or populations. However, it is unclear whether splicing differentiation is a simple byproduct of population divergence, or whether it also acts as a mechanism for reproductive isolation. We examined mRNA splicing in wild × domesticated sunflower hybrids and observed 45 novel splice forms that were not found in the wild or domesticated parents, in addition to 16 high-expression parental splice forms that were absent in one or more hybrids. We identify loci associated with variation in the levels of these splice forms, finding that many aberrant transcripts were regulated by multiple alleles with nonadditive interactions. We identified particular spliceosome components that were associated with 21 aberrant isoforms, more than half of which were located in or near regulatory QTL. These incompatibilities often resulted in alteration in the protein-coding regions of the novel transcripts in the form of frameshifts and truncations. By associating the splice variation in these genes with size and growth rate measurements, we found that the cumulative expression of all aberrant transcripts was correlated with a significant reduction in growth rate. Our results lead us to propose a model where divergent splicing regulatory loci could act as incompatibility loci that contribute to the evolution of reproductive isolation.
Collapse
Affiliation(s)
- Chris C R Smith
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, 80309
| | - Loren H Rieseberg
- Department of Botany, University of British Columbia, Vancouver, BC, VCR 2A5, Canada
| | - Brent S Hulke
- Edward T. Schafer Agricultural Research Center, USDA-ARS, Fargo, North Dakota, 58102
| | - Nolan C Kane
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, 80309
| |
Collapse
|
23
|
Lord J, Baralle D. Splicing in the Diagnosis of Rare Disease: Advances and Challenges. Front Genet 2021; 12:689892. [PMID: 34276790 PMCID: PMC8280750 DOI: 10.3389/fgene.2021.689892] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/07/2021] [Indexed: 12/13/2022] Open
Abstract
Mutations which affect splicing are significant contributors to rare disease, but are frequently overlooked by diagnostic sequencing pipelines. Greater ascertainment of pathogenic splicing variants will increase diagnostic yields, ending the diagnostic odyssey for patients and families affected by rare disorders, and improving treatment and care strategies. Advances in sequencing technologies, predictive modeling, and understanding of the mechanisms of splicing in recent years pave the way for improved detection and interpretation of splice affecting variants, yet several limitations still prohibit their routine ascertainment in diagnostic testing. This review explores some of these advances in the context of clinical application and discusses challenges to be overcome before these variants are comprehensively and routinely recognized in diagnostics.
Collapse
Affiliation(s)
- Jenny Lord
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
- Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, United Kingdom
| |
Collapse
|
24
|
Manrubia S, Cuesta JA, Aguirre J, Ahnert SE, Altenberg L, Cano AV, Catalán P, Diaz-Uriarte R, Elena SF, García-Martín JA, Hogeweg P, Khatri BS, Krug J, Louis AA, Martin NS, Payne JL, Tarnowski MJ, Weiß M. From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics. Phys Life Rev 2021; 38:55-106. [PMID: 34088608 DOI: 10.1016/j.plrev.2021.03.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022]
Abstract
Understanding how genotypes map onto phenotypes, fitness, and eventually organisms is arguably the next major missing piece in a fully predictive theory of evolution. We refer to this generally as the problem of the genotype-phenotype map. Though we are still far from achieving a complete picture of these relationships, our current understanding of simpler questions, such as the structure induced in the space of genotypes by sequences mapped to molecular structures, has revealed important facts that deeply affect the dynamical description of evolutionary processes. Empirical evidence supporting the fundamental relevance of features such as phenotypic bias is mounting as well, while the synthesis of conceptual and experimental progress leads to questioning current assumptions on the nature of evolutionary dynamics-cancer progression models or synthetic biology approaches being notable examples. This work delves with a critical and constructive attitude into our current knowledge of how genotypes map onto molecular phenotypes and organismal functions, and discusses theoretical and empirical avenues to broaden and improve this comprehension. As a final goal, this community should aim at deriving an updated picture of evolutionary processes soundly relying on the structural properties of genotype spaces, as revealed by modern techniques of molecular and functional analysis.
Collapse
Affiliation(s)
- Susanna Manrubia
- Department of Systems Biology, Centro Nacional de Biotecnología (CSIC), Madrid, Spain; Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain.
| | - José A Cuesta
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain; Instituto de Biocomputación y Física de Sistemas Complejos (BiFi), Universidad de Zaragoza, Spain; UC3M-Santander Big Data Institute (IBiDat), Getafe, Madrid, Spain
| | - Jacobo Aguirre
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Centro de Astrobiología, CSIC-INTA, ctra. de Ajalvir km 4, 28850 Torrejón de Ardoz, Madrid, Spain
| | - Sebastian E Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Alejandro V Cano
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo Catalán
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Madrid, Spain; Departamento de Matemáticas, Universidad Carlos III de Madrid, Leganés, Spain
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, Universidad Autónoma de Madrid, Madrid, Spain; Instituto de Investigaciones Biomédicas "Alberto Sols" (UAM-CSIC), Madrid, Spain
| | - Santiago F Elena
- Instituto de Biología Integrativa de Sistemas, I(2)SysBio (CSIC-UV), València, Spain; The Santa Fe Institute, Santa Fe, NM, USA
| | | | - Paulien Hogeweg
- Theoretical Biology and Bioinformatics Group, Utrecht University, the Netherlands
| | - Bhavin S Khatri
- The Francis Crick Institute, London, UK; Department of Life Sciences, Imperial College London, London, UK
| | - Joachim Krug
- Institute for Biological Physics, University of Cologne, Köln, Germany
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK
| | - Nora S Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Joshua L Payne
- Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Marcel Weiß
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, Cambridge, UK; Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
25
|
Mordstein C, Cano L, Morales AC, Young B, Ho AT, Rice AM, Liss M, Hurst LD, Kudla G. Transcription, mRNA export and immune evasion shape the codon usage of viruses. Genome Biol Evol 2021; 13:6275682. [PMID: 33988683 PMCID: PMC8410142 DOI: 10.1093/gbe/evab106] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2021] [Indexed: 12/15/2022] Open
Abstract
The nucleotide composition, dinucleotide composition, and codon usage of many viruses differs from their hosts. These differences arise because viruses are subject to unique mutation and selection pressures that do not apply to host genomes; however, the molecular mechanisms that underlie these evolutionary forces are unclear. Here, we analysed the patterns of codon usage in 1,520 vertebrate-infecting viruses, focusing on parameters known to be under selection and associated with gene regulation. We find that GC content, dinucleotide content, and splicing and m6A modification-related sequence motifs are associated with the type of genetic material (DNA or RNA), strandedness, and replication compartment of viruses. In an experimental follow-up, we find that the effects of GC content on gene expression depend on whether the genetic material is delivered to the cell as DNA or mRNA, whether it is transcribed by endogenous or exogenous RNA polymerase, and whether transcription takes place in the nucleus or cytoplasm. Our results suggest that viral codon usage cannot be explained by a simple adaptation to the codon usage of the host - instead, it reflects the combination of multiple selective and mutational pressures, including the need for efficient transcription, export, and immune evasion.
Collapse
Affiliation(s)
- Christine Mordstein
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, UK.,The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Laura Cano
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, UK
| | - Atahualpa Castillo Morales
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Bethan Young
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, UK.,The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Alexander T Ho
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Alan M Rice
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Michael Liss
- Thermo Fisher Scientific, GENEART GmbH, Regensburg, Germany
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Molecular Medicine, The University of Edinburgh, Edinburgh, UK
| |
Collapse
|
26
|
Routh S, Acharyya A, Dhar R. A two-step PCR assembly for construction of gene variants across large mutational distances. Biol Methods Protoc 2021; 6:bpab007. [PMID: 33928191 PMCID: PMC8062255 DOI: 10.1093/biomethods/bpab007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/09/2021] [Accepted: 04/01/2021] [Indexed: 11/14/2022] Open
Abstract
Construction of empirical fitness landscapes has transformed our understanding of genotype-phenotype relationships across genes. However, most empirical fitness landscapes have been constrained to the local genotype neighbourhood of a gene primarily due to our limited ability to systematically construct genotypes that differ by a large number of mutations. Although a few methods have been proposed in the literature, these techniques are complex owing to several steps of construction or contain a large number of amplification cycles that increase chances of non-specific mutations. A few other described methods require amplification of the whole vector, thereby increasing the chances of vector backbone mutations that can have unintended consequences for study of fitness landscapes. Thus, this has substantially constrained us from traversing large mutational distances in the genotype network, thereby limiting our understanding of the interactions between multiple mutations and the role these interactions play in evolution of novel phenotypes. In the current work, we present a simple but powerful approach that allows us to systematically and accurately construct gene variants at large mutational distances. Our approach relies on building-up small fragments containing targeted mutations in the first step followed by assembly of these fragments into the complete gene fragment by polymerase chain reaction (PCR). We demonstrate the utility of our approach by constructing variants that differ by up to 11 mutations in a model gene. Our work thus provides an accurate method for construction of multi-mutant variants of genes and therefore will transform the studies of empirical fitness landscapes by enabling exploration of genotypes that are far away from a starting genotype.
Collapse
Affiliation(s)
- Shreya Routh
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India
| | - Anamika Acharyya
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India
| | - Riddhiman Dhar
- Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India
| |
Collapse
|
27
|
Cheng J, Çelik MH, Kundaje A, Gagneur J. MTSplice predicts effects of genetic variants on tissue-specific splicing. Genome Biol 2021; 22:94. [PMID: 33789710 PMCID: PMC8011109 DOI: 10.1186/s13059-021-02273-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 01/14/2021] [Indexed: 12/20/2022] Open
Abstract
We develop the free and open-source model Multi-tissue Splicing (MTSplice) to predict the effects of genetic variants on splicing of cassette exons in 56 human tissues. MTSplice combines MMSplice, which models constitutive regulatory sequences, with a new neural network that models tissue-specific regulatory sequences. MTSplice outperforms MMSplice on predicting tissue-specific variations associated with genetic variants in most tissues of the GTEx dataset, with largest improvements on brain tissues. Furthermore, MTSplice predicts that autism-associated de novo mutations are enriched for variants affecting splicing specifically in the brain. We foresee that MTSplice will aid interpreting variants associated with tissue-specific disorders.
Collapse
Affiliation(s)
- Jun Cheng
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748, Germany.
| | - Muhammed Hasan Çelik
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748, Germany
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748, Germany.
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
- Institute of Human Genetics, Klinikum rechts der Isar, Technical University of Munich, Munich, Germany.
| |
Collapse
|
28
|
Liao SE, Regev O. Splicing at the phase-separated nuclear speckle interface: a model. Nucleic Acids Res 2021; 49:636-645. [PMID: 33337476 PMCID: PMC7826271 DOI: 10.1093/nar/gkaa1209] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/24/2020] [Accepted: 12/03/2020] [Indexed: 02/07/2023] Open
Abstract
Phase-separated membraneless bodies play important roles in nucleic acid biology. While current models for the roles of phase separation largely focus on the compartmentalization of constituent proteins, we reason that other properties of phase separation may play functional roles. Specifically, we propose that interfaces of phase-separated membraneless bodies could have functional roles in spatially organizing biochemical reactions. Here we propose such a model for the nuclear speckle, a membraneless body implicated in RNA splicing. In our model, sequence-dependent RNA positioning along the nuclear speckle interface coordinates RNA splicing. Our model asserts that exons are preferentially sequestered into nuclear speckles through binding by SR proteins, while introns are excluded through binding by nucleoplasmic hnRNP proteins. As a result, splice sites at exon-intron boundaries are preferentially positioned at nuclear speckle interfaces. This positioning exposes splice sites to interface-localized spliceosomes, enabling the subsequent splicing reaction. Our model provides a simple mechanism that seamlessly explains much of the complex logic of splicing. This logic includes experimental results such as the antagonistic duality between splicing factors, the position dependence of splicing sequence motifs, and the collective contribution of many motifs to splicing decisions. Similar functional roles for phase-separated interfaces may exist for other membraneless bodies.
Collapse
Affiliation(s)
- Susan E Liao
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Oded Regev
- Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| |
Collapse
|
29
|
Feng W, Zhao P, Zheng X, Hu Z, Liu J. Profiling Novel Alternative Splicing within Multiple Tissues Provides Useful Insights into Porcine Genome Annotation. Genes (Basel) 2020; 11:genes11121405. [PMID: 33255998 PMCID: PMC7760890 DOI: 10.3390/genes11121405] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 11/24/2020] [Accepted: 11/24/2020] [Indexed: 12/22/2022] Open
Abstract
Alternative splicing (AS) is a process during gene expression that results in a single gene coding for different protein variants. AS contributes to transcriptome and proteome diversity. In order to characterize AS in pigs, genome-wide transcripts and AS events were detected using RNA sequencing of 34 different tissues in Duroc pigs. In total, 138,403 AS events and 29,270 expressed genes were identified. An alternative donor site was the most common AS form and accounted for 44% of the total AS events. The percentage of the other three AS forms (exon skipping, alternative acceptor site, and intron retention) was approximately 19%. The results showed that the most common AS events involving alternative donor sites could produce different transcripts or proteins that affect the biological processes. The expression of genes with tissue-specific AS events showed that gene functions were consistent with tissue functions. AS increased proteome diversity and resulted in novel proteins that gained or lost important functional domains. In summary, these findings extend porcine genome annotation and highlight roles that AS could play in determining tissue identity.
Collapse
|
30
|
Baeza-Centurion P, Miñana B, Valcárcel J, Lehner B. Mutations primarily alter the inclusion of alternatively spliced exons. eLife 2020; 9:59959. [PMID: 33112234 PMCID: PMC7673789 DOI: 10.7554/elife.59959] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 10/27/2020] [Indexed: 12/17/2022] Open
Abstract
Genetic analyses and systematic mutagenesis have revealed that synonymous, non-synonymous and intronic mutations frequently alter the inclusion levels of alternatively spliced exons, consistent with the concept that altered splicing might be a common mechanism by which mutations cause disease. However, most exons expressed in any cell are highly-included in mature mRNAs. Here, by performing deep mutagenesis of highly-included exons and by analysing the association between genome sequence variation and exon inclusion across the transcriptome, we report that mutations only very rarely alter the inclusion of highly-included exons. This is true for both exonic and intronic mutations as well as for perturbations in trans. Therefore, mutations that affect splicing are not evenly distributed across primary transcripts but are focussed in and around alternatively spliced exons with intermediate inclusion levels. These results provide a resource for prioritising synonymous and other variants as disease-causing mutations.
Collapse
Affiliation(s)
- Pablo Baeza-Centurion
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Belén Miñana
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Juan Valcárcel
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
31
|
Cano AV, Payne JL. Mutation bias interacts with composition bias to influence adaptive evolution. PLoS Comput Biol 2020; 16:e1008296. [PMID: 32986712 PMCID: PMC7571706 DOI: 10.1371/journal.pcbi.1008296] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 10/19/2020] [Accepted: 08/30/2020] [Indexed: 11/19/2022] Open
Abstract
Mutation is a biased stochastic process, with some types of mutations occurring more frequently than others. Previous work has used synthetic genotype-phenotype landscapes to study how such mutation bias affects adaptive evolution. Here, we consider 746 empirical genotype-phenotype landscapes, each of which describes the binding affinity of target DNA sequences to a transcription factor, to study the influence of mutation bias on adaptive evolution of increased binding affinity. By using empirical genotype-phenotype landscapes, we need to make only few assumptions about landscape topography and about the DNA sequences that each landscape contains. The latter is particularly important because the set of sequences that a landscape contains determines the types of mutations that can occur along a mutational path to an adaptive peak. That is, landscapes can exhibit a composition bias—a statistical enrichment of a particular type of mutation relative to a null expectation, throughout an entire landscape or along particular mutational paths—that is independent of any bias in the mutation process. Our results reveal the way in which composition bias interacts with biases in the mutation process under different population genetic conditions, and how such interaction impacts fundamental properties of adaptive evolution, such as its predictability, as well as the evolution of genetic diversity and mutational robustness. Mutation is often depicted as a random process due its unpredictable nature. However, such randomness does not imply uniformly distributed outcomes, because some DNA sequence changes happen more frequently than others. Mutation bias can be an orienting factor in adaptive evolution, influencing the mutational trajectories populations follow toward higher-fitness genotypes. Because these trajectories are typically just a small subset of all possible mutational trajectories, they can exhibit composition bias—an enrichment of a particular kind of DNA sequence change, such as transition or transversion mutations. Here, we use empirical data from eukaryotic transcriptional regulation to study how mutation bias and composition bias interact to influence adaptive evolution.
Collapse
Affiliation(s)
- Alejandro V. Cano
- Institute of Integrative Biology, ETH, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Joshua L. Payne
- Institute of Integrative Biology, ETH, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
32
|
Jakobson CM, Jarosz DF. What Has a Century of Quantitative Genetics Taught Us About Nature's Genetic Tool Kit? Annu Rev Genet 2020; 54:439-464. [PMID: 32897739 DOI: 10.1146/annurev-genet-021920-102037] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The complexity of heredity has been appreciated for decades: Many traits are controlled not by a single genetic locus but instead by polymorphisms throughout the genome. The importance of complex traits in biology and medicine has motivated diverse approaches to understanding their detailed genetic bases. Here, we focus on recent systematic studies, many in budding yeast, which have revealed that large numbers of all kinds of molecular variation, from noncoding to synonymous variants, can make significant contributions to phenotype. Variants can affect different traits in opposing directions, and their contributions can be modified by both the environment and the epigenetic state of the cell. The integration of prospective (synthesizing and analyzing variants) and retrospective (examining standing variation) approaches promises to reveal how natural selection shapes quantitative traits. Only by comprehensively understanding nature's genetic tool kit can we predict how phenotypes arise from the complex ensembles of genetic variants in living organisms.
Collapse
Affiliation(s)
- Christopher M Jakobson
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California 94305, USA;
| | - Daniel F Jarosz
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California 94305, USA; .,Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94305, USA
| |
Collapse
|
33
|
Kováčová T, Souček P, Hujová P, Freiberger T, Grodecká L. Splicing Enhancers at Intron-Exon Borders Participate in Acceptor Splice Sites Recognition. Int J Mol Sci 2020; 21:ijms21186553. [PMID: 32911621 PMCID: PMC7554774 DOI: 10.3390/ijms21186553] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/05/2020] [Accepted: 09/06/2020] [Indexed: 02/07/2023] Open
Abstract
Acceptor splice site recognition (3′ splice site: 3′ss) is a fundamental step in precursor messenger RNA (pre-mRNA) splicing. Generally, the U2 small nuclear ribonucleoprotein (snRNP) auxiliary factor (U2AF) heterodimer recognizes the 3′ss, of which U2AF35 has a dual function: (i) It binds to the intron–exon border of some 3′ss and (ii) mediates enhancer-binding splicing activators’ interactions with the spliceosome. Alternative mechanisms for 3′ss recognition have been suggested, yet they are still not thoroughly understood. Here, we analyzed 3′ss recognition where the intron–exon border is bound by a ubiquitous splicing regulator SRSF1. Using the minigene analysis of two model exons and their mutants, BRCA2 exon 12 and VARS2 exon 17, we showed that the exon inclusion correlated much better with the predicted SRSF1 affinity than 3′ss quality, which were assessed using the Catalog of Inferred Sequence Binding Preferences of RNA binding proteins (CISBP-RNA) database and maximum entropy algorithm (MaxEnt) predictor and the U2AF35 consensus matrix, respectively. RNA affinity purification proved SRSF1 binding to the model 3′ss. On the other hand, knockdown experiments revealed that U2AF35 also plays a role in these exons’ inclusion. Most probably, both factors stochastically bind the 3′ss, supporting exon recognition, more apparently in VARS2 exon 17. Identifying splicing activators as 3′ss recognition factors is crucial for both a basic understanding of splicing regulation and human genetic diagnostics when assessing variants’ effects on splicing.
Collapse
Affiliation(s)
- Tatiana Kováčová
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation, 656 91 Brno, Czech Republic; (T.K.); (P.S.); (P.H.); (T.F.)
- Faculty of Medicine, Masaryk University, 625 00 Brno, Czech Republic
| | - Přemysl Souček
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation, 656 91 Brno, Czech Republic; (T.K.); (P.S.); (P.H.); (T.F.)
- Faculty of Medicine, Masaryk University, 625 00 Brno, Czech Republic
| | - Pavla Hujová
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation, 656 91 Brno, Czech Republic; (T.K.); (P.S.); (P.H.); (T.F.)
- Faculty of Medicine, Masaryk University, 625 00 Brno, Czech Republic
| | - Tomáš Freiberger
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation, 656 91 Brno, Czech Republic; (T.K.); (P.S.); (P.H.); (T.F.)
- Faculty of Medicine, Masaryk University, 625 00 Brno, Czech Republic
| | - Lucie Grodecká
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation, 656 91 Brno, Czech Republic; (T.K.); (P.S.); (P.H.); (T.F.)
- Correspondence:
| |
Collapse
|
34
|
DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol 2020; 21:207. [PMID: 32799905 PMCID: PMC7429474 DOI: 10.1186/s13059-020-02091-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 07/05/2020] [Indexed: 12/30/2022] Open
Abstract
Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.
Collapse
|
35
|
Zhou J, McCandlish DM. Minimum epistasis interpolation for sequence-function relationships. Nat Commun 2020; 11:1782. [PMID: 32286265 PMCID: PMC7156698 DOI: 10.1038/s41467-020-15512-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G.
Collapse
Affiliation(s)
- Juannan Zhou
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
36
|
Gauthier L, Stynen B, Serohijos AWR, Michnick SW. Genetics' Piece of the PI: Inferring the Origin of Complex Traits and Diseases from Proteome-Wide Protein-Protein Interaction Dynamics. Bioessays 2019; 42:e1900169. [PMID: 31854021 DOI: 10.1002/bies.201900169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 11/15/2019] [Indexed: 11/07/2022]
Abstract
How do common and rare genetic polymorphisms contribute to quantitative traits or disease risk and progression? Multiple human traits have been extensively characterized at the genomic level, revealing their complex genetic architecture. However, it is difficult to resolve the mechanisms by which specific variants contribute to a phenotype. Recently, analyses of variant effects on molecular traits have uncovered intermediate mechanisms that link sequence variation to phenotypic changes. Yet, these methods only capture a fraction of genetic contributions to phenotype. Here, in reviewing the field, it is proposed that complex traits can be understood by characterizing the dynamics of biochemical networks within living cells, and that the effects of genetic variation can be captured on these networks by using protein-protein interaction (PPI) methodologies. This synergy between PPI methodologies and the genetics of complex traits opens new avenues to investigate the molecular etiology of human diseases and to facilitate their prevention or treatment.
Collapse
Affiliation(s)
- Louis Gauthier
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada.,Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada
| | - Bram Stynen
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada.,Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada
| | - Adrian W R Serohijos
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada.,Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada
| | - Stephen W Michnick
- Departement de Biochimie, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada.,Centre Robert-Cedergren en Bioinformatique et Génomique, Université de Montréal, 2900 Édouard-Montpetit, Montréal, Quebec, H3T 1J4, Canada
| |
Collapse
|
37
|
Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol 2019; 20:223. [PMID: 31679514 PMCID: PMC6827219 DOI: 10.1186/s13059-019-1845-6] [Citation(s) in RCA: 125] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
Multiplex assays of variant effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here, we present MaveDB ( https://www.mavedb.org ), a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first such application, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.
Collapse
Affiliation(s)
- Daniel Esposito
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
| | - Jochen Weile
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Lea M Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
- Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Frederick P Roth
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Canadian Institute for Advanced Research, Toronto, ON, Canada.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia.
- Bioinformatics and Cancer Genomics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
38
|
Obeng EA, Stewart C, Abdel-Wahab O. Altered RNA Processing in Cancer Pathogenesis and Therapy. Cancer Discov 2019; 9:1493-1510. [PMID: 31611195 PMCID: PMC6825565 DOI: 10.1158/2159-8290.cd-19-0399] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 06/21/2019] [Accepted: 08/08/2019] [Indexed: 12/17/2022]
Abstract
Major advances in our understanding of cancer pathogenesis and therapy have come from efforts to catalog genomic alterations in cancer. A growing number of large-scale genomic studies have uncovered mutations that drive cancer by perturbing cotranscriptional and post-transcriptional regulation of gene expression. These include alterations that affect each phase of RNA processing, including splicing, transport, editing, and decay of messenger RNA. The discovery of these events illuminates a number of novel therapeutic vulnerabilities generated by aberrant RNA processing in cancer, several of which have progressed to clinical development. SIGNIFICANCE: There is increased recognition that genetic alterations affecting RNA splicing and polyadenylation are common in cancer and may generate novel therapeutic opportunities. Such mutations may occur within an individual gene or in RNA processing factors themselves, thereby influencing splicing of many downstream target genes. This review discusses the biological impact of these mutations on tumorigenesis and the therapeutic approaches targeting cells bearing these mutations.
Collapse
Affiliation(s)
- Esther A Obeng
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, Tennessee.
| | - Connor Stewart
- Human Oncology and Pathogenesis Program and Leukemia Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Omar Abdel-Wahab
- Human Oncology and Pathogenesis Program and Leukemia Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, New York.
| |
Collapse
|
39
|
CRISPR-Cas9-based mutagenesis frequently provokes on-target mRNA misregulation. Nat Commun 2019; 10:4056. [PMID: 31492834 PMCID: PMC6731291 DOI: 10.1038/s41467-019-12028-5] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 08/14/2019] [Indexed: 12/16/2022] Open
Abstract
The introduction of insertion-deletions (INDELs) by non-homologous end-joining (NHEJ) pathway underlies the mechanistic basis of CRISPR-Cas9-directed genome editing. Selective gene ablation using CRISPR-Cas9 is achieved by installation of a premature termination codon (PTC) from a frameshift-inducing INDEL that elicits nonsense-mediated decay (NMD) of the mutant mRNA. Here, by examining the mRNA and protein products of CRISPR targeted genes in a cell line panel with presumed gene knockouts, we detect the production of foreign mRNAs or proteins in ~50% of the cell lines. We demonstrate that these aberrant protein products stem from the introduction of INDELs that promote internal ribosomal entry, convert pseudo-mRNAs (alternatively spliced mRNAs with a PTC) into protein encoding molecules, or induce exon skipping by disruption of exon splicing enhancers (ESEs). Our results reveal challenges to manipulating gene expression outcomes using INDEL-based mutagenesis and strategies useful in mitigating their impact on intended genome-editing outcomes.
Collapse
|
40
|
Li X, Lalić J, Baeza-Centurion P, Dhar R, Lehner B. Changes in gene expression predictably shift and switch genetic interactions. Nat Commun 2019; 10:3886. [PMID: 31467279 PMCID: PMC6715729 DOI: 10.1038/s41467-019-11735-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/29/2019] [Indexed: 11/18/2022] Open
Abstract
Non-additive interactions between mutations occur extensively and also change across conditions, making genetic prediction a difficult challenge. To better understand the plasticity of genetic interactions (epistasis), we combine mutations in a single protein performing a single function (a transcriptional repressor inhibiting a target gene). Even in this minimal system, genetic interactions switch from positive (suppressive) to negative (enhancing) as the expression of the gene changes. These seemingly complicated changes can be predicted using a mathematical model that propagates the effects of mutations on protein folding to the cellular phenotype. More generally, changes in gene expression should be expected to alter the effects of mutations and how they interact whenever the relationship between expression and a phenotype is nonlinear, which is the case for most genes. These results have important implications for understanding genotype-phenotype maps and illustrate how changes in genetic interactions can often—but not always—be predicted by hierarchical mechanistic models. Non-additive genetic interactions are plastic and can complicate genetic prediction. Here, using deep mutagenesis of the lambda repressor, Li et al. reveal that changes in gene expression can alter the strength and direction of genetic interactions between mutations in many genes and develop mathematical models for predicting them.
Collapse
Affiliation(s)
- Xianghua Li
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Jasna Lalić
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Pablo Baeza-Centurion
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Riddhiman Dhar
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain. .,ICREA, Pg. Luis Companys 23, Barcelona, 08010, Spain.
| |
Collapse
|
41
|
Leman R, Gaildrat P, Le Gac G, Ka C, Fichou Y, Audrezet MP, Caux-Moncoutier V, Caputo SM, Boutry-Kryza N, Léone M, Mazoyer S, Bonnet-Dorion F, Sevenet N, Guillaud-Bataille M, Rouleau E, Bressac-de Paillerets B, Wappenschmidt B, Rossing M, Muller D, Bourdon V, Revillon F, Parsons MT, Rousselin A, Davy G, Castelain G, Castéra L, Sokolowska J, Coulet F, Delnatte C, Férec C, Spurdle AB, Martins A, Krieger S, Houdayer C. Novel diagnostic tool for prediction of variant spliceogenicity derived from a set of 395 combined in silico/in vitro studies: an international collaborative effort. Nucleic Acids Res 2019; 46:7913-7923. [PMID: 29750258 PMCID: PMC6125621 DOI: 10.1093/nar/gky372] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 04/27/2018] [Indexed: 12/17/2022] Open
Abstract
Variant interpretation is the key issue in molecular diagnosis. Spliceogenic variants exemplify this issue as each nucleotide variant can be deleterious via disruption or creation of splice site consensus sequences. Consequently, reliable in silico prediction of variant spliceogenicity would be a major improvement. Thanks to an international effort, a set of 395 variants studied at the mRNA level and occurring in 5′ and 3′ consensus regions (defined as the 11 and 14 bases surrounding the exon/intron junction, respectively) was collected for 11 different genes, including BRCA1, BRCA2, CFTR and RHD, and used to train and validate a new prediction protocol named Splicing Prediction in Consensus Elements (SPiCE). SPiCE combines in silico predictions from SpliceSiteFinder-like and MaxEntScan and uses logistic regression to define optimal decision thresholds. It revealed an unprecedented sensitivity and specificity of 99.5 and 95.2%, respectively, and the impact on splicing was correctly predicted for 98.8% of variants. We therefore propose SPiCE as the new tool for predicting variant spliceogenicity. It could be easily implemented in any diagnostic laboratory as a routine decision making tool to help geneticists to face the deluge of variants in the next-generation sequencing era. SPiCE is accessible at (https://sourceforge.net/projects/spicev2-1/).
Collapse
Affiliation(s)
- Raphaël Leman
- Laboratoire de Biologie Clinique et Oncologique, Centre François Baclesse, 14000 Caen, France.,Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France.,Normandie Univ, UNICAEN, 14000 Caen, France
| | - Pascaline Gaildrat
- Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France
| | - Gérald Le Gac
- Inserm UMR1078, Genetics, Functional Genomics and Biotechnology, Université de Bretagne Occidentale, 29200 Brest, France
| | - Chandran Ka
- Inserm UMR1078, Genetics, Functional Genomics and Biotechnology, Université de Bretagne Occidentale, 29200 Brest, France
| | - Yann Fichou
- Inserm UMR1078, Genetics, Functional Genomics and Biotechnology, Université de Bretagne Occidentale, 29200 Brest, France
| | - Marie-Pierre Audrezet
- Inserm UMR1078, Genetics, Functional Genomics and Biotechnology, Université de Bretagne Occidentale, 29200 Brest, France
| | - Virginie Caux-Moncoutier
- Inserm U830, Institut Curie Centre de Recherches, 75005 Paris, France.,Université Paris Descartes, Sorbonne Paris Cité, 75005 Paris, France.,Service de Génétique, Institut Curie, 75005 Paris, France
| | | | - Nadia Boutry-Kryza
- Unité Mixte de Génétique Constitutionnelle des Cancers Fréquents, Hospices Civils de Lyon, 69000 Lyon, France
| | - Mélanie Léone
- Unité Mixte de Génétique Constitutionnelle des Cancers Fréquents, Hospices Civils de Lyon, 69000 Lyon, France
| | - Sylvie Mazoyer
- Lyon Neuroscience Research Center-CRNL, Inserm U1028, CNRS UMR 5292, University of Lyon, 69008 Lyon, France
| | - Françoise Bonnet-Dorion
- Inserm U916, Département de Pathologie, Laboratoire de Génétique Constitutionnelle, Institut Bergonié, 33000 Bordeaux, France
| | - Nicolas Sevenet
- Inserm U916, Département de Pathologie, Laboratoire de Génétique Constitutionnelle, Institut Bergonié, 33000 Bordeaux, France
| | | | - Etienne Rouleau
- Gustave Roussy, Université Paris-Saclay, Département de Biopathologie, 94805 Villejuif, France
| | | | - Barbara Wappenschmidt
- Division of Molecular Gynaeco-Oncology, Department of Gynaecology and Obstetrics, University Hospital of Cologne, 50937 Cologne, Germany
| | - Maria Rossing
- Centre for Genomic Medicine, Rigshospitalet, University of Copenhagen, 1017 Copenhagen, Denmark
| | - Danielle Muller
- Laboratoire d'Oncogénétique, Centre Paul Strauss, 67000 Strasbourg, France
| | - Violaine Bourdon
- Laboratoire d'Oncogénétique Moléculaire, Institut Paoli-Calmettes, 13009 Marseille, France
| | - Françoise Revillon
- Laboratoire d'Oncogénétique Moléculaire Humaine, Centre Oscar Lambret, 59000 Lille, France
| | - Michael T Parsons
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 4006 Herston, Queensland, Australia
| | - Antoine Rousselin
- Laboratoire de Biologie Clinique et Oncologique, Centre François Baclesse, 14000 Caen, France.,Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France
| | - Grégoire Davy
- Laboratoire de Biologie Clinique et Oncologique, Centre François Baclesse, 14000 Caen, France.,Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France
| | - Gaia Castelain
- Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France
| | - Laurent Castéra
- Laboratoire de Biologie Clinique et Oncologique, Centre François Baclesse, 14000 Caen, France.,Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France
| | | | - Florence Coulet
- Service de génétique, Hôpital Pitié Salpétrière, AP-HP, 75013 Paris, France
| | - Capucine Delnatte
- Laboratoire de génétique moléculaire, CHU Nantes, 44000 Nantes, France
| | - Claude Férec
- Inserm UMR1078, Genetics, Functional Genomics and Biotechnology, Université de Bretagne Occidentale, 29200 Brest, France
| | - Amanda B Spurdle
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 4006 Herston, Queensland, Australia
| | - Alexandra Martins
- Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France
| | - Sophie Krieger
- Laboratoire de Biologie Clinique et Oncologique, Centre François Baclesse, 14000 Caen, France.,Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, Normandie Univ, UNIROUEN, Normandy Centre for Genomic and Personalized Medicine, 76031 Rouen, France.,Normandie Univ, UNICAEN, 14000 Caen, France
| | - Claude Houdayer
- Inserm U830, Institut Curie Centre de Recherches, 75005 Paris, France.,Université Paris Descartes, Sorbonne Paris Cité, 75005 Paris, France.,Service de Génétique, Institut Curie, 75005 Paris, France
| |
Collapse
|
42
|
Frumkin I, Yofe I, Bar-Ziv R, Gurvich Y, Lu YY, Voichek Y, Towers R, Schirman D, Krebber H, Pilpel Y. Evolution of intron splicing towards optimized gene expression is based on various Cis- and Trans-molecular mechanisms. PLoS Biol 2019; 17:e3000423. [PMID: 31442222 PMCID: PMC6728054 DOI: 10.1371/journal.pbio.3000423] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 09/05/2019] [Accepted: 08/08/2019] [Indexed: 01/09/2023] Open
Abstract
Splicing expands, reshapes, and regulates the transcriptome of eukaryotic organisms. Despite its importance, key questions remain unanswered, including the following: Can splicing evolve when organisms adapt to new challenges? How does evolution optimize inefficiency of introns’ splicing and of the splicing machinery? To explore these questions, we evolved yeast cells that were engineered to contain an inefficiently spliced intron inside a gene whose protein product was under selection for an increased expression level. We identified a combination of mutations in Cis (within the gene of interest) and in Trans (in mRNA-maturation machinery). Surprisingly, the mutations in Cis resided outside of known intronic functional sites and improved the intron’s splicing efficiency potentially by easing tight mRNA structures. One of these mutations hampered a protein’s domain that was not under selection, demonstrating the evolutionary flexibility of multi-domain proteins as one domain functionality was improved at the expense of the other domain. The Trans adaptations resided in two proteins, Npl3 and Gbp2, that bind pre-mRNAs and are central to their maturation. Interestingly, these mutations either increased or decreased the affinity of these proteins to mRNA, presumably allowing faster spliceosome recruitment or increased time before degradation of the pre-mRNAs, respectively. Altogether, our work reveals various mechanistic pathways toward optimizations of intron splicing to ultimately adapt gene expression patterns to novel demands. An experimental evolution study involving an inefficiently spliced intron reveals that the splicing machinery, introns, and RNA quality control factors evolve in Cis and in Trans when cells optimize their transcriptome to new challenges.
Collapse
Affiliation(s)
- Idan Frumkin
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
- * E-mail: (IF); (YP)
| | - Ido Yofe
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Raz Bar-Ziv
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Yonat Gurvich
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Yen-Yun Lu
- Abteilung für Molekulare Genetik, Institut für Mikrobiologie und Genetik, Göttinger Zentrum für Molekulare Biowissenschaften (GZMB), Georg-August Universität Göttingen, Göttingen, Germany
| | - Yoav Voichek
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Ruth Towers
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Dvir Schirman
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Heike Krebber
- Abteilung für Molekulare Genetik, Institut für Mikrobiologie und Genetik, Göttinger Zentrum für Molekulare Biowissenschaften (GZMB), Georg-August Universität Göttingen, Göttingen, Germany
| | - Yitzhak Pilpel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
- * E-mail: (IF); (YP)
| |
Collapse
|
43
|
Jobbins AM, Reichenbach LF, Lucas CM, Hudson AJ, Burley GA, Eperon IC. The mechanisms of a mammalian splicing enhancer. Nucleic Acids Res 2019; 46:2145-2158. [PMID: 29394380 PMCID: PMC5861446 DOI: 10.1093/nar/gky056] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 01/19/2018] [Indexed: 12/21/2022] Open
Abstract
Exonic splicing enhancer (ESE) sequences are bound by serine & arginine-rich (SR) proteins, which in turn enhance the recruitment of splicing factors. It was inferred from measurements of splicing around twenty years ago that Drosophila doublesex ESEs are bound stably by SR proteins, and that the bound proteins interact directly but with low probability with their targets. However, it has not been possible with conventional methods to demonstrate whether mammalian ESEs behave likewise. Using single molecule multi-colour colocalization methods to study SRSF1-dependent ESEs, we have found that that the proportion of RNA molecules bound by SRSF1 increases with the number of ESE repeats, but only a single molecule of SRSF1 is bound. We conclude that initial interactions between SRSF1 and an ESE are weak and transient, and that these limit the activity of a mammalian ESE. We tested whether the activation step involves the propagation of proteins along the RNA or direct interactions with 3' splice site components by inserting hexaethylene glycol or abasic RNA between the ESE and the target 3' splice site. These insertions did not block activation, and we conclude that the activation step involves direct interactions. These results support a model in which regulatory proteins bind transiently and in dynamic competition, with the result that each ESE in an exon contributes independently to the probability that an activator protein is bound and in close proximity to a splice site.
Collapse
Affiliation(s)
- Andrew M Jobbins
- Leicester Institute of Structural & Chemical Biology and Department of Molecular & Cell Biology, University of Leicester, UK
| | | | - Christian M Lucas
- Leicester Institute of Structural & Chemical Biology and Department of Molecular & Cell Biology, University of Leicester, UK
| | - Andrew J Hudson
- Leicester Institute of Structural & Chemical Biology and Department of Chemistry, University of Leicester, UK
| | - Glenn A Burley
- Department of Pure and Applied Chemistry, University of Strathclyde, UK
| | - Ian C Eperon
- Leicester Institute of Structural & Chemical Biology and Department of Molecular & Cell Biology, University of Leicester, UK
| |
Collapse
|
44
|
Abstract
Evolvability is the ability of a biological system to produce phenotypic variation that is both heritable and adaptive. It has long been the subject of anecdotal observations and theoretical work. In recent years, however, the molecular causes of evolvability have been an increasing focus of experimental work. Here, we review recent experimental progress in areas as different as the evolution of drug resistance in cancer cells and the rewiring of transcriptional regulation circuits in vertebrates. This research reveals the importance of three major themes: multiple genetic and non-genetic mechanisms to generate phenotypic diversity, robustness in genetic systems, and adaptive landscape topography. We also discuss the mounting evidence that evolvability can evolve and the question of whether it evolves adaptively.
Collapse
|
45
|
Genetic variations within alternative splicing associated genes are associated with breast cancer susceptibility in Chinese women. Gene 2019; 706:140-145. [DOI: 10.1016/j.gene.2019.05.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 05/06/2019] [Accepted: 05/08/2019] [Indexed: 11/20/2022]
|
46
|
Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, Marks DS. Inferring protein 3D structure from deep mutation scans. Nat Genet 2019; 51:1170-1176. [PMID: 31209393 PMCID: PMC7295002 DOI: 10.1038/s41588-019-0432-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/29/2019] [Indexed: 11/09/2022]
Abstract
We describe an experimental method of three-dimensional (3D) structure determination that exploits the increasing ease of high-throughput mutational scans. Inspired by the success of using natural, evolutionary sequence covariation to compute protein and RNA folds, we explored whether 'laboratory', synthetic sequence variation might also yield 3D structures. We analyzed five large-scale mutational scans and discovered that the pairs of residues with the largest positive epistasis in the experiments are sufficient to determine the 3D fold. We show that the strongest epistatic pairings from genetic screens of three proteins, a ribozyme and a protein interaction reveal 3D contacts within and between macromolecules. Using these experimental epistatic pairs, we compute ab initio folds for a GB1 domain (within 1.8 Å of the crystal structure) and a WW domain (2.1 Å). We propose strategies that reduce the number of mutants needed for contact prediction, suggesting that genomics-based techniques can efficiently predict 3D structure.
Collapse
Affiliation(s)
- Nathan J Rollins
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| | - Frank J Poelwijk
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michael A Stiffler
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nicholas P Gauthier
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
47
|
Souček P, Réblová K, Kramárek M, Radová L, Grymová T, Hujová P, Kováčová T, Lexa M, Grodecká L, Freiberger T. High-throughput analysis revealed mutations' diverging effects on SMN1 exon 7 splicing. RNA Biol 2019; 16:1364-1376. [PMID: 31213135 DOI: 10.1080/15476286.2019.1630796] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Splicing-affecting mutations can disrupt gene function by altering the transcript assembly. To ascertain splicing dysregulation principles, we modified a minigene assay for the parallel high-throughput evaluation of different mutations by next-generation sequencing. In our model system, all exonic and six intronic positions of the SMN1 gene's exon 7 were mutated to all possible nucleotide variants, which amounted to 180 unique single-nucleotide mutants and 470 double mutants. The mutations resulted in a wide range of splicing aberrations. Exonic splicing-affecting mutations resulted either in substantial exon skipping, supposedly driven by predicted exonic splicing silencer or cryptic donor splice site (5'ss) and de novo 5'ss strengthening and use. On the other hand, a single disruption of exonic splicing enhancer was not sufficient to cause major exon skipping, suggesting these elements can be substituted during exon recognition. While disrupting the acceptor splice site led only to exon skipping, some 5'ss mutations potentiated the use of three different cryptic 5'ss. Generally, single mutations supporting cryptic 5'ss use displayed better pre-mRNA/U1 snRNA duplex stability and increased splicing regulatory element strength across the original 5'ss. Analyzing double mutants supported the predominating splicing regulatory elements' effect, but U1 snRNA binding could contribute to the global balance of splicing isoforms. Based on these findings, we suggest that creating a new splicing enhancer across the mutated 5'ss can be one of the main factors driving cryptic 5'ss use.
Collapse
Affiliation(s)
- Přemysl Souček
- Medical Genomics RG, Central European Institute of Technology, Masaryk University , Brno , Czech Republic.,Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation , Brno , Czech Republic
| | - Kamila Réblová
- Medical Genomics RG, Central European Institute of Technology, Masaryk University , Brno , Czech Republic
| | - Michal Kramárek
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation , Brno , Czech Republic
| | - Lenka Radová
- Medical Genomics RG, Central European Institute of Technology, Masaryk University , Brno , Czech Republic
| | - Tereza Grymová
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation , Brno , Czech Republic
| | - Pavla Hujová
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation , Brno , Czech Republic
| | - Tatiana Kováčová
- Medical Genomics RG, Central European Institute of Technology, Masaryk University , Brno , Czech Republic
| | - Matej Lexa
- Faculty of Informatics, Masaryk University , Brno , Czech Republic
| | - Lucie Grodecká
- Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation , Brno , Czech Republic
| | - Tomáš Freiberger
- Medical Genomics RG, Central European Institute of Technology, Masaryk University , Brno , Czech Republic.,Molecular Genetics Laboratory, Centre for Cardiovascular Surgery and Transplantation , Brno , Czech Republic.,Faculty of Medicine, Masaryk University , Brno , Czech Republic
| |
Collapse
|
48
|
Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 2019; 20:433-460. [PMID: 31082279 DOI: 10.1146/annurev-genom-083118-014857] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The same mutation can have different effects in different individuals. One important reason for this is that the outcome of a mutation can depend on the genetic context in which it occurs. This dependency is known as epistasis. In recent years, there has been a concerted effort to quantify the extent of pairwise and higher-order genetic interactions between mutations through deep mutagenesis of proteins and RNAs. This research has revealed two major components of epistasis: nonspecific genetic interactions caused by nonlinearities in genotype-to-phenotype maps, and specific interactions between particular mutations. Here, we provide an overview of our current understanding of the mechanisms causing epistasis at the molecular level, the consequences of genetic interactions for evolution and genetic prediction, and the applications of epistasis for understanding biology and determining macromolecular structures.
Collapse
Affiliation(s)
- Júlia Domingo
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , ,
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; , , .,Universitat Pompeu Fabra, 08003 Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
49
|
Wang X, Yang M, Ren D, Terzaghi W, Deng XW, He G. Cis-regulated alternative splicing divergence and its potential contribution to environmental responses in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 97:555-570. [PMID: 30375060 DOI: 10.1111/tpj.14142] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 10/19/2018] [Accepted: 10/23/2018] [Indexed: 05/14/2023]
Abstract
Alternative splicing (AS) plays key roles in plant development and the responses of plants to environmental changes. However, the mechanisms underlying AS divergence (differential expression of transcript isoforms resulting from AS) in plant accessions and its contribution to responses to environmental stimuli remain unclear. In this study, we investigated genome-wide variation of AS in Arabidopsis thaliana accessions Col-0, Bur-0, C24, Kro-0 and Ler-1, as well as their F1 hybrids, and characterized the regulatory mechanisms for AS divergence by RNA sequencing. We found that most of the divergent AS events in Arabidopsis accessions were cis-regulated by sequence variation, including those in core splice site and splicing motifs. Many genes that differed in AS between Col-0 and Bur-0 were involved in stimulus responses. Further genome-wide association analyses of 22 environmental variables showed that single nucleotide polymorphisms influencing known splice site strength were also associated with environmental stress responses. These results demonstrate that cis-variation in genomic sequences among Arabidopsis accessions was the dominant contributor to AS divergence, and it may contribute to differences in environmental responses among Arabidopsis accessions.
Collapse
Affiliation(s)
- Xuncheng Wang
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Advanced Agriculture Sciences and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Mei Yang
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Advanced Agriculture Sciences and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Diqiu Ren
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Advanced Agriculture Sciences and School of Life Sciences, Peking University, Beijing, 100871, China
| | - William Terzaghi
- Department of Biology, Wilkes University, Wilkes-Barre, PA, 18766, USA
| | - Xing-Wang Deng
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Advanced Agriculture Sciences and School of Life Sciences, Peking University, Beijing, 100871, China
| | - Guangming He
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, School of Advanced Agriculture Sciences and School of Life Sciences, Peking University, Beijing, 100871, China
| |
Collapse
|
50
|
Baeza-Centurion P, Miñana B, Schmiedel JM, Valcárcel J, Lehner B. Combinatorial Genetics Reveals a Scaling Law for the Effects of Mutations on Splicing. Cell 2019; 176:549-563.e23. [PMID: 30661752 DOI: 10.1016/j.cell.2018.12.010] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 08/29/2018] [Accepted: 12/07/2018] [Indexed: 02/08/2023]
Abstract
Despite a wealth of molecular knowledge, quantitative laws for accurate prediction of biological phenomena remain rare. Alternative pre-mRNA splicing is an important regulated step in gene expression frequently perturbed in human disease. To understand the combined effects of mutations during evolution, we quantified the effects of all possible combinations of exonic mutations accumulated during the emergence of an alternatively spliced human exon. This revealed that mutation effects scale non-monotonically with the inclusion level of an exon, with each mutation having maximum effect at a predictable intermediate inclusion level. This scaling is observed genome-wide for cis and trans perturbations of splicing, including for natural and disease-associated variants. Mathematical modeling suggests that competition between alternative splice sites is sufficient to cause this non-linearity in the genotype-phenotype map. Combining the global scaling law with specific pairwise interactions between neighboring mutations allows accurate prediction of the effects of complex genotype changes involving >10 mutations.
Collapse
Affiliation(s)
- Pablo Baeza-Centurion
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Belén Miñana
- Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Jörn M Schmiedel
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain
| | - Juan Valcárcel
- Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| | - Ben Lehner
- Systems Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, 08003 Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| |
Collapse
|