1
|
Farberov S, Ulitsky I. Systematic analysis of the target recognition and repression by the Pumilio proteins. Nucleic Acids Res 2024:gkae929. [PMID: 39470700 DOI: 10.1093/nar/gkae929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 09/23/2024] [Accepted: 10/07/2024] [Indexed: 10/30/2024] Open
Abstract
RNA binding proteins orchestrate the post-transcriptional fate of RNA molecules, but the principles of their action remain poorly understood. Pumilio (PUM) proteins bind 3' UTRs of mRNAs and lead to mRNA decay. To comprehensively map the determinants of recognition of sequences by PUM proteins in cells and to study the binding outcomes, we developed a massively parallel RNA assay that profiled thousands of PUM-binding sites in cells undergoing various perturbations or RNA immunoprecipitation. By studying fragments from the NORAD long non-coding RNA, we find two features that antagonize repression by PUM proteins - G/C rich sequences, particularly those upstream of the PUM recognition element, and binding of FAM120A, which limits the repression elicited by PUM-binding sites. We also find that arrays of PUM sites separated by 8-12 bases offer particularly strong repression and use them to develop a particularly sensitive reporter for PUM repression. In contrast, PUM sites separated by shorter linkers, such as some of those found in NORAD, exhibit strong activity interdependence, likely mediated by competition between PUM binding and formation of strong secondary structures. Overall, our findings expand our understanding of the determinants of PUM protein activity in human cells.
Collapse
Affiliation(s)
- Svetlana Farberov
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
2
|
You N, Liu C, Gu Y, Wang R, Jia H, Zhang T, Jiang S, Shi J, Chen M, Guan MX, Sun S, Pei S, Liu Z, Shen N. SpliceTransformer predicts tissue-specific splicing linked to human diseases. Nat Commun 2024; 15:9129. [PMID: 39443442 PMCID: PMC11500173 DOI: 10.1038/s41467-024-53088-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 09/24/2024] [Indexed: 10/25/2024] Open
Abstract
We present SpliceTransformer (SpTransformer), a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence. SpTransformer outperforms all previous methods on splicing prediction. Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations, and occur at different frequencies across tissue types. Importantly, tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation. We validate the enrichment in three brain disease datasets involving over 164,000 individuals. Additionally, we identify single nucleotide variations that cause brain-specific splicing alterations, and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes. Finally, SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy, demonstrating the potential to infer disease-causing tissue-specific splicing events. SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases.
Collapse
Affiliation(s)
- Ningyuan You
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Chang Liu
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuxin Gu
- Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China
| | - Rong Wang
- Department of Hematology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hanying Jia
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Tianyun Zhang
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
| | - Song Jiang
- National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
| | - Jinsong Shi
- National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Min-Xin Guan
- Institute of Genetics, Zhejiang University School of Medicine, Hangzhou, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Shanshan Pei
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China
- Bone Marrow Transplantation Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhihong Liu
- National Clinical Research Center for Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, China.
| | - Ning Shen
- Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, China.
| |
Collapse
|
3
|
La Fleur A, Shi Y, Seelig G. Decoding biology with massively parallel reporter assays and machine learning. Genes Dev 2024; 38:843-865. [PMID: 39362779 PMCID: PMC11535156 DOI: 10.1101/gad.351800.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding of cis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses on cis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.
Collapse
Affiliation(s)
- Alyssa La Fleur
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Yongsheng Shi
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California, Irvine, Irvine, California 92697, USA;
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA;
- Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
4
|
Sullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ. SpliceVarDB: A comprehensive database of experimentally validated human splicing variants. Am J Hum Genet 2024; 111:2164-2175. [PMID: 39226898 PMCID: PMC11480807 DOI: 10.1016/j.ajhg.2024.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 08/03/2024] [Accepted: 08/06/2024] [Indexed: 09/05/2024] Open
Abstract
Variants that alter gene splicing are estimated to comprise up to a third of all disease-causing variants, yet they are hard to predict from DNA sequencing data alone. To overcome this, many groups are incorporating RNA-based analyses, which are resource intensive, particularly for diagnostic laboratories. There are thousands of functionally validated variants that induce mis-splicing; however, this information is not consolidated, and they are under-represented in ClinVar, which presents a barrier to variant interpretation and can result in duplication of validation efforts. To address this issue, we developed SpliceVarDB, an online database consolidating over 50,000 variants assayed for their effects on splicing in over 8,000 human genes. We evaluated over 500 published data sources and established a spliceogenicity scale to standardize, harmonize, and consolidate variant validation data generated by a range of experimental protocols. According to the strength of their supporting evidence, variants were classified as "splice-altering" (∼25%), "not splice-altering" (∼25%), and "low-frequency splice-altering" (∼50%), which correspond to weak or indeterminate evidence of spliceogenicity. Importantly, 55% of the splice-altering variants in SpliceVarDB are outside the canonical splice sites (5.6% are deep intronic). These variants can support the variant curation diagnostic pathway and can be used to provide the high-quality data necessary to develop more accurate in silico splicing predictors. The variants are accessible through an online platform, SpliceVarDB, with additional features for visualization, variant information, in silico predictions, and validation metrics. SpliceVarDB is a very large collection of splice-altering variants and is available at https://splicevardb.org.
Collapse
Affiliation(s)
- Patricia J Sullivan
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia; UNSW Centre for Childhood Cancer Research, UNSW Sydney, Sydney, NSW, Australia
| | - Julian M W Quinn
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Weilin Wu
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Mark Pinese
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia; School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia
| | - Mark J Cowley
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia.
| |
Collapse
|
5
|
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. From computational models of the splicing code to regulatory mechanisms and therapeutic implications. Nat Rev Genet 2024:10.1038/s41576-024-00774-2. [PMID: 39358547 DOI: 10.1038/s41576-024-00774-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2024] [Indexed: 10/04/2024]
Abstract
Since the discovery of RNA splicing and its role in gene expression, researchers have sought a set of rules, an algorithm or a computational model that could predict the splice isoforms, and their frequencies, produced from any transcribed gene in a specific cellular context. Over the past 30 years, these models have evolved from simple position weight matrices to deep-learning models capable of integrating sequence data across vast genomic distances. Most recently, new model architectures are moving the field closer to context-specific alternative splicing predictions, and advances in sequencing technologies are expanding the type of data that can be used to inform and interpret such models. Together, these developments are driving improved understanding of splicing regulatory mechanisms and emerging applications of the splicing code to the rational design of RNA- and splicing-based therapeutics.
Collapse
Affiliation(s)
- Charlotte Capitanchik
- The Francis Crick Institute, London, UK
- UK Dementia Research Institute at King's College London, London, UK
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK
| | - Oscar G Wilkins
- The Francis Crick Institute, London, UK
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| | - Jernej Ule
- The Francis Crick Institute, London, UK.
- UK Dementia Research Institute at King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology & Neuroscience, King's College London, London, UK.
- National Institute of Chemistry, Ljubljana, Slovenia.
| |
Collapse
|
6
|
Kohvakka A, Sattari M, Nättinen J, Aapola U, Gregorová P, Tammela TLJ, Uusitalo H, Sarin LP, Visakorpi T, Latonen L. Long noncoding RNA EPCART regulates translation through PI3K/AKT/mTOR pathway and PDCD4 in prostate cancer. Cancer Gene Ther 2024; 31:1536-1546. [PMID: 39147845 PMCID: PMC11489079 DOI: 10.1038/s41417-024-00822-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 07/29/2024] [Accepted: 08/08/2024] [Indexed: 08/17/2024]
Abstract
While hundreds of cancer-associated long noncoding RNAs (lncRNAs) have been discovered, their functional role in cancer cells is still largely a mystery. An increasing number of lncRNAs are recognized to function in the cytoplasm, e.g., as modulators of translation. Here, we investigated the detailed molecular identity and functional role of EPCART, a lncRNA we previously discovered to be a potential oncogene in prostate cancer (PCa). First, we interrogated the transcript structure of EPCART and then confirmed EPCART to be a non-peptide-coding lncRNA using in silico methods. Pathway analysis of differentially expressed protein-coding genes in EPCART knockout cells implied that EPCART modulates the translational machinery of PCa cells. EPCART was also largely located in the cytoplasm and at the sites of translation. With quantitative proteome analysis on EPCART knockout cells we discovered PDCD4, an inhibitor of protein translation, to be increased by EPCART reduction. Further studies indicated that the inhibitory effect of EPCART silencing on translation was mediated by reduced activation of AKT and inhibition of the mTORC1 pathway. Together, our findings identify EPCART as a translation-associated lncRNA that functions via modulation of the PI3K/AKT/mTORC1 pathway in PCa cells. Furthermore, we provide evidence for the prognostic potential of PDCD4 in PCa tumors in connection with EPCART.
Collapse
Affiliation(s)
- Annika Kohvakka
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University and Tays Cancer Center, Tampere University Hospital, 33520, Tampere, Finland
| | - Mina Sattari
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University and Tays Cancer Center, Tampere University Hospital, 33520, Tampere, Finland
| | - Janika Nättinen
- Eye and Vision Research Group, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
| | - Ulla Aapola
- Eye and Vision Research Group, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
| | - Pavlína Gregorová
- RNAcious Laboratory, Molecular and Integrative Biosciences Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, 00014, Helsinki, Finland
| | - Teuvo L J Tammela
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University and Tays Cancer Center, Tampere University Hospital, 33520, Tampere, Finland
- Department of Urology, Tampere University Hospital, Tampere, Finland
| | - Hannu Uusitalo
- Eye and Vision Research Group, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
- Tays Eye Centre, Tampere University Hospital, 33520, Tampere, Finland
| | - L Peter Sarin
- RNAcious Laboratory, Molecular and Integrative Biosciences Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, 00014, Helsinki, Finland
- HiLIFE Helsinki Institute of Life Science, University of Helsinki, 00014, Helsinki, Finland
| | - Tapio Visakorpi
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University and Tays Cancer Center, Tampere University Hospital, 33520, Tampere, Finland.
- Fimlab Laboratories Ltd, Tampere University Hospital, 00014, Tampere, Finland.
| | - Leena Latonen
- Institute of Biomedicine, University of Eastern Finland, 70211, Kuopio, Finland.
| |
Collapse
|
7
|
Gosai SJ, Castro RI, Fuentes N, Butts JC, Mouri K, Alasoadura M, Kales S, Nguyen TTL, Noche RR, Rao AS, Joy MT, Sabeti PC, Reilly SK, Tewhey R. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature 2024; 634:1211-1220. [PMID: 39443793 PMCID: PMC11525185 DOI: 10.1038/s41586-024-08070-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/18/2024] [Indexed: 10/25/2024]
Abstract
Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in the body1-3. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for these intended purposes has arisen naturally. Here we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell-type specificity. We take advantage of innovations in deep neural network modelling of CRE activity across three cell types, efficient in silico optimization and massively parallel reporter assays to design and empirically test thousands of CREs4-8. Through large-scale in vitro validation, we show that synthetic sequences are more effective at driving cell-type-specific expression in three cell lines compared with natural sequences from the human genome and achieve specificity in analogous tissues when tested in vivo. Synthetic sequences exhibit distinct motif vocabulary associated with activity in the on-target cell type and a simultaneous reduction in the activity of off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs from massively parallel reporter assay models and demonstrate the required literacy to write fit-for-purpose regulatory code.
Collapse
Affiliation(s)
- Sager J Gosai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Graduate Program in Biological and Biomedical Science, Boston, MA, USA.
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | | | - Natalia Fuentes
- The Jackson Laboratory, Bar Harbor, ME, USA
- Harvard College, Harvard University, Cambridge, MA, USA
| | - John C Butts
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
| | | | | | | | | | - Ramil R Noche
- Department of Comparative Medicine, Yale School of Medicine, New Haven, CT, USA
- Yale Zebrafish Research Core, Yale School of Medicine, New Haven, CT, USA
| | - Arya S Rao
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Mary T Joy
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Immunology and Infectious Diseases, Harvard T H Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
- Wu Tsai Institute, Yale University, New Haven, CT, USA.
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA.
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA.
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA.
| |
Collapse
|
8
|
van Karnebeek CDM, O'Donnell-Luria A, Baynam G, Baudot A, Groza T, Jans JJM, Lassmann T, Letinturier MCV, Montgomery SB, Robinson PN, Sansen S, Mehrian-Shai R, Steward C, Kosaki K, Durao P, Sadikovic B. Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet J Rare Dis 2024; 19:357. [PMID: 39334316 PMCID: PMC11438178 DOI: 10.1186/s13023-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-Enterology Endocrinology Metabolism, Amsterdam University Medical Centers, Amsterdam, The Netherlands.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, USA
| | - Gareth Baynam
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, Australia
- European Molecular Biology Laboratory (EMBL-EBI), European Bioinformatics Institute, Hinxton, UK
| | - Judith J M Jans
- Department of Genetics, Section Metabolic Diagnostics, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | | - Ruty Mehrian-Shai
- Pediatric Brain Cancer Molecular Lab, Sheba Medical Center, Ramat Gan, Israel
| | | | | | - Patricia Durao
- The Cure and Action for Tay-Sachs (CATS) Foundation, Altringham, UK
| | - Bekim Sadikovic
- Verspeeten Clinical Genome Centre, London Health Sciences, London, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
9
|
Anwar K, Thaller G, Saeed-Zidane M. Genetic Variations in the NRF2 Microsatellite Contribute to the Regulation of Bovine Sperm-Borne Antioxidant Capacity. Cells 2024; 13:1601. [PMID: 39404365 PMCID: PMC11482559 DOI: 10.3390/cells13191601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 08/30/2024] [Accepted: 09/18/2024] [Indexed: 10/19/2024] Open
Abstract
Nuclear factor (erythroid-derived 2)-like 2 (NRF2) is a transcription factor protein-coding gene, considered a master regulator of the cellular stress response. The genetic variations of the NRF2 could influence its transcriptional profile and, subsequently, the stress resilience in all cell types, including sperm cells. Therefore, the sperm-borne antioxidants abundance in association with the genetic variation of a GCC microsatellite located at the 5' upstream region of the NRF2 gene was investigated in young (n = 8) and old (n = 8) Holstein bulls' sperm cells at different seasons. The sperm DNA was sequenced using Sanger sequencing, while- the sperm-borne mRNA analysis was carried out using the synthesized cDNA and qPCR. The data were statistically analyzed using GraphPad Prism 10.0.2 software. The results showed that two bulls had a heterozygous genotype of eight and nine GCC repeats, while biallelic of eight, nine, and fifteen repeats were identified in two, ten, and two bulls, respectively. The computational in silico analysis revealed that the NRF2 upstream sequence with 15, 9, and 8 GCC repeats bound with 725, 709, and 707 DNA-binding transcription factor proteins, respectively. Lower quality of sperm DNA was detected in the spring season compared to other seasons and in young bulls compared to old ones, particularly in the summer and autumn seasons. The mRNA expression analysis revealed that the PRDX1 gene was the abundant transcript among the studied sperm-borne antioxidants and was significantly determined in old bulls' spermatozoa. Moreover, two transcripts of the NRF2 gene and antioxidant (SOD1, CAT, GPX1, TXN1, NQO1) genes displayed differential expression patterns between the age groups across seasons in an antioxidant-dependent manner. The bulls with a heterozygous GCC sequence exhibited elevated sperm-borne mRNA levels of NRF2 and PRDX1 transcripts. Taken together, the findings suggest that the NRF2-GCC microsatellite may contribute to the transcription regulation of NRF2 transcripts and their subsequent downstream antioxidants in bovine sperm cells.
Collapse
Affiliation(s)
| | | | - Mohammed Saeed-Zidane
- Molecular Genetics Group, Institute of Animal Breeding and Husbandry, Christian-Albrechts-University Kiel, 24118 Kiel, Germany
| |
Collapse
|
10
|
Giudice J, Jiang H. Splicing regulation through biomolecular condensates and membraneless organelles. Nat Rev Mol Cell Biol 2024; 25:683-700. [PMID: 38773325 DOI: 10.1038/s41580-024-00739-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/23/2024]
Abstract
Biomolecular condensates, sometimes also known as membraneless organelles (MLOs), can form through weak multivalent intermolecular interactions of proteins and nucleic acids, a process often associated with liquid-liquid phase separation. Biomolecular condensates are emerging as sites and regulatory platforms of vital cellular functions, including transcription and RNA processing. In the first part of this Review, we comprehensively discuss how alternative splicing regulates the formation and properties of condensates, and conversely the roles of biomolecular condensates in splicing regulation. In the second part, we focus on the spatial connection between splicing regulation and nuclear MLOs such as transcriptional condensates, splicing condensates and nuclear speckles. We then discuss key studies showing how splicing regulation through biomolecular condensates is implicated in human pathologies such as neurodegenerative diseases, different types of cancer, developmental disorders and cardiomyopathies, and conclude with a discussion of outstanding questions pertaining to the roles of condensates and MLOs in splicing regulation and how to experimentally study them.
Collapse
Affiliation(s)
- Jimena Giudice
- Department of Cell Biology and Physiology, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- McAllister Heart Institute, School of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Hao Jiang
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA.
| |
Collapse
|
11
|
Lee YF, Phua CZJ, Yuan J, Zhang B, Lee MY, Kannan S, Chiu YHJ, Koh CWQ, Yap CK, Lim EKH, Chen J, Lim Y, Lee JJH, Skanderup AJ, Wang Z, Zhai W, Tan NS, Verma CS, Tay Y, Tan DSW, Tam WL. PARP4 interacts with hnRNPM to regulate splicing during lung cancer progression. Genome Med 2024; 16:91. [PMID: 39034402 PMCID: PMC11265163 DOI: 10.1186/s13073-024-01328-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/02/2024] [Indexed: 07/23/2024] Open
Abstract
BACKGROUND The identification of cancer driver genes from sequencing data has been crucial in deepening our understanding of tumor biology and expanding targeted therapy options. However, apart from the most commonly altered genes, the mechanisms underlying the contribution of other mutations to cancer acquisition remain understudied. Leveraging on our whole-exome sequencing of the largest Asian lung adenocarcinoma (LUAD) cohort (n = 302), we now functionally assess the mechanistic role of a novel driver, PARP4. METHODS In vitro and in vivo tumorigenicity assays were used to study the functional effects of PARP4 loss and mutation in multiple lung cancer cell lines. Interactomics analysis by quantitative mass spectrometry was conducted to identify PARP4's interaction partners. Transcriptomic data from cell lines and patient tumors were used to investigate splicing alterations. RESULTS PARP4 depletion or mutation (I1039T) promotes the tumorigenicity of KRAS- or EGFR-driven lung cancer cells. Disruption of the vault complex, with which PARP4 is commonly associated, did not alter tumorigenicity, indicating that PARP4's tumor suppressive activity is mediated independently. The splicing regulator hnRNPM is a potentially novel PARP4 interaction partner, the loss of which likewise promotes tumor formation. hnRNPM loss results in splicing perturbations, with a propensity for dysregulated intronic splicing that was similarly observed in PARP4 knockdown cells and in LUAD cohort patients with PARP4 copy number loss. CONCLUSIONS PARP4 is a novel modulator of lung adenocarcinoma, where its tumor suppressive activity is mediated not through the vault complex-unlike conventionally thought, but in association with its novel interaction partner hnRNPM, thus suggesting a role for splicing dysregulation in LUAD tumorigenesis.
Collapse
Affiliation(s)
- Yi Fei Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Cheryl Zi Jin Phua
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Ju Yuan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Bin Zhang
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - May Yin Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Srinivasaraghavan Kannan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, Matrix, Singapore, 138671, Singapore
| | - Yui Hei Jasper Chiu
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Casslynn Wei Qian Koh
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Choon Kong Yap
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Edwin Kok Hao Lim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Jianbin Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Yuhua Lim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Jane Jia Hui Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Anders Jacobsen Skanderup
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
| | - Zhenxun Wang
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
- Centre for Vision Research, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Weiwei Zhai
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Nguan Soon Tan
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, Singapore, 308232, Singapore
| | - Chandra S Verma
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, Matrix, Singapore, 138671, Singapore
- Department of Biological Sciences, National University of Singapore, 16 Science Drive 4, Singapore, 117558, Singapore
| | - Yvonne Tay
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- NUS Centre for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597, Singapore
| | - Daniel Shao Weng Tan
- Division of Medical Oncology, National Cancer Centre Singapore, 30 Hospital Boulevard, Singapore, 168583, Singapore
| | - Wai Leong Tam
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Singapore.
- Cancer Science Institute of Singapore, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore.
- NUS Centre for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, 14 Medical Drive, Singapore, 117599, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597, Singapore.
| |
Collapse
|
12
|
Bracken CP, Goodall GJ, Gregory PA. RNA regulatory mechanisms controlling TGF-β signaling and EMT in cancer. Semin Cancer Biol 2024; 102-103:4-16. [PMID: 38917876 DOI: 10.1016/j.semcancer.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/05/2024] [Accepted: 06/13/2024] [Indexed: 06/27/2024]
Abstract
Epithelial-mesenchymal transition (EMT) is a major contributor to metastatic progression and is prominently regulated by TGF-β signalling. Both EMT and TGF-β pathway components are tightly controlled by non-coding RNAs - including microRNAs (miRNAs), long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) - that collectively have major impacts on gene expression and resulting cellular states. While miRNAs are the best characterised regulators of EMT and TGF-β signaling and the miR-200-ZEB1/2 feedback loop plays a central role, important functions for lncRNAs and circRNAs are also now emerging. This review will summarise our current understanding of the roles of non-coding RNAs in EMT and TGF-β signaling with a focus on their functions in cancer progression.
Collapse
Affiliation(s)
- Cameron P Bracken
- Centre for Cancer Biology, University of South Australia and SA Pathology, Adelaide, SA 5000, Australia; Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5000, Australia; School of Biological Sciences, Faculty of Sciences, Engineering and Technology, The University of Adelaide, Adelaide, SA 5000, Australia.
| | - Gregory J Goodall
- Centre for Cancer Biology, University of South Australia and SA Pathology, Adelaide, SA 5000, Australia; Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5000, Australia; School of Biological Sciences, Faculty of Sciences, Engineering and Technology, The University of Adelaide, Adelaide, SA 5000, Australia.
| | - Philip A Gregory
- Centre for Cancer Biology, University of South Australia and SA Pathology, Adelaide, SA 5000, Australia; Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5000, Australia.
| |
Collapse
|
13
|
Huang AC, Su JY, Hung YJ, Chiang HL, Chen YT, Huang YT, Yu CHA, Lin HN, Lin CL. SpliceAPP: an interactive web server to predict splicing errors arising from human mutations. BMC Genomics 2024; 25:600. [PMID: 38877417 PMCID: PMC11179192 DOI: 10.1186/s12864-024-10512-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 06/07/2024] [Indexed: 06/16/2024] Open
Abstract
BACKGROUND Splicing variants are a major class of pathogenic mutations, with their severity equivalent to nonsense mutations. However, redundant and degenerate splicing signals hinder functional assessments of sequence variations within introns, particularly at branch sites. We have established a massively parallel splicing assay to assess the impact on splicing of 11,191 disease-relevant variants. Based on the experimental results, we then applied regression-based methods to identify factors determining splicing decisions and their respective weights. RESULTS Our statistical modeling is highly sensitive, accurately annotating the splicing defects of near-exon intronic variants, outperforming state-of-the-art predictive tools. We have incorporated the algorithm and branchpoint information into a web-based tool, SpliceAPP, to provide an interactive application. This user-friendly website allows users to upload any genetic variants with genome coordinates (e.g., chr15 74,687,208 A G), and the tool will output predictions for splicing error scores and evaluate the impact on nearby splice sites. Additionally, users can query branch site information within the region of interest. CONCLUSIONS In summary, SpliceAPP represents a pioneering approach to screening pathogenic intronic variants, contributing to the development of precision medicine. It also facilitates the annotation of splicing motifs. SpliceAPP is freely accessible using the link https://bc.imb.sinica.edu.tw/SpliceAPP . Source code can be downloaded at https://github.com/hsinnan75/SpliceAPP .
Collapse
Affiliation(s)
- Ang-Chu Huang
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan
| | - Jia-Ying Su
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, International Graduate Program, Academia Sinica, Taipei, Taiwan
- Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Yu-Jen Hung
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan
| | - Hung-Lun Chiang
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan
| | - Yi-Ting Chen
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan
| | - Yen-Tsung Huang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Bioinformatics Program, International Graduate Program, Academia Sinica, Taipei, Taiwan
| | - Chen-Hsin Albert Yu
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan
| | - Hsin-Nan Lin
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan.
| | - Chien-Ling Lin
- Institute of Molecular Biology, Academia Sinica, No. 128, Sec. 2, Academia Road, Nangang District, Taipei City, 115014, Taiwan.
- Genome and Systems Biology Degree Program, Academia Sinica and National Taiwan University, Taipei, Taiwan.
- Bioinformatics Program, International Graduate Program, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
14
|
Quinones-Valdez G, Amoah K, Xiao X. Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599101. [PMID: 38915585 PMCID: PMC11195283 DOI: 10.1101/2024.06.14.599101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Genetic regulation of alternative splicing constitutes an important link between genetic variation and disease. Nonetheless, RNA splicing is regulated by both cis-acting elements and trans-acting splicing factors. Determining splicing events that are directed primarily by the cis- or trans-acting mechanisms will greatly inform our understanding of the genetic basis of disease. Here, we show that long-read RNA-seq, combined with our new method isoLASER, enables a clear segregation of cis- and trans-directed splicing events for individual samples. The genetic linkage of splicing is largely individual-specific, in stark contrast to the tissue-specific pattern of splicing profiles. Analysis of long-read RNA-seq data from human and mouse revealed thousands of cis-directed splicing events susceptible to genetic regulation. We highlight such events in the HLA genes whose analysis was challenging with short-read data. We also highlight novel cis-directed splicing events in Alzheimer's disease-relevant genes such as MAPT and BIN1. Together, the clear demarcation of cis- and trans-directed splicing paves ways for future studies of the genetic basis of disease.
Collapse
Affiliation(s)
- Giovanni Quinones-Valdez
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kofi Amoah
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
15
|
Buerer L, Clark NE, Welch A, Duan C, Taggart AJ, Townley BA, Wang J, Soemedi R, Rong S, Lin CL, Zeng Y, Katolik A, Staley JP, Damha MJ, Mosammaparast N, Fairbrother WG. The debranching enzyme Dbr1 regulates lariat turnover and intron splicing. Nat Commun 2024; 15:4617. [PMID: 38816363 PMCID: PMC11139901 DOI: 10.1038/s41467-024-48696-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/05/2024] [Indexed: 06/01/2024] Open
Abstract
The majority of genic transcription is intronic. Introns are removed by splicing as branched lariat RNAs which require rapid recycling. The branch site is recognized during splicing catalysis and later debranched by Dbr1 in the rate-limiting step of lariat turnover. Through generation of a viable DBR1 knockout cell line, we find the predominantly nuclear Dbr1 enzyme to encode the sole debranching activity in human cells. Dbr1 preferentially debranches substrates that contain canonical U2 binding motifs, suggesting that branchsites discovered through sequencing do not necessarily represent those favored by the spliceosome. We find that Dbr1 also exhibits specificity for particular 5' splice site sequences. We identify Dbr1 interactors through co-immunoprecipitation mass spectrometry. We present a mechanistic model for Dbr1 recruitment to the branchpoint through the intron-binding protein AQR. In addition to a 20-fold increase in lariats, Dbr1 depletion increases exon skipping. Using ADAR fusions to timestamp lariats, we demonstrate a defect in spliceosome recycling. In the absence of Dbr1, spliceosomal components remain associated with the lariat for a longer period of time. As splicing is co-transcriptional, slower recycling increases the likelihood that downstream exons will be available for exon skipping.
Collapse
Affiliation(s)
- Luke Buerer
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Nathaniel E Clark
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Allison J Taggart
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Brittany A Townley
- Department of Pathology & Immunology, Center for Genome Integrity, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Jing Wang
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Rachel Soemedi
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
| | - Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA
- Department of Genetics, Yale University, New Haven, CT, 06520, USA
| | - Chien-Ling Lin
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA
- Institute of Molecular Biology, Academia Sinica, Taipei, 115, Taiwan
| | - Yi Zeng
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL, 60637, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Adam Katolik
- Department of Chemistry, McGill University, Montreal, QC, H3A 0B8, Canada
| | - Jonathan P Staley
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL, 60637, USA
| | - Masad J Damha
- Department of Chemistry, McGill University, Montreal, QC, H3A 0B8, Canada
| | - Nima Mosammaparast
- Department of Pathology & Immunology, Center for Genome Integrity, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - William G Fairbrother
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, 02903, USA.
- Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.
| |
Collapse
|
16
|
Singh S, Deshetty UM, Ray S, Oladapo A, Horanieh E, Buch S, Periyasamy P. Non-Coding RNAs in HIV Infection, NeuroHIV, and Related Comorbidities. Cells 2024; 13:898. [PMID: 38891030 PMCID: PMC11171711 DOI: 10.3390/cells13110898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/20/2024] [Accepted: 05/22/2024] [Indexed: 06/20/2024] Open
Abstract
NeuroHIV affects approximately 30-60% of people living with HIV-1 (PLWH) and is characterized by varying degrees of cognitive impairments, presenting a multifaceted challenge, the underlying cause of which is chronic, low-level neuroinflammation. Such smoldering neuroinflammation is likely an outcome of lifelong reliance on antiretrovirals coupled with residual virus replication in the brains of PLWH. Despite advancements in antiretroviral therapeutics, our understanding of the molecular mechanism(s) driving inflammatory processes in the brain remains limited. Recent times have seen the emergence of non-coding RNAs (ncRNAs) as critical regulators of gene expression, underlying the neuroinflammatory processes in HIV infection, NeuroHIV, and their associated comorbidities. This review explores the role of various classes of ncRNAs and their regulatory functions implicated in HIV infection, neuropathogenesis, and related conditions. The dysregulated expression of ncRNAs is known to exacerbate the neuroinflammatory responses, thus contributing to neurocognitive impairments in PLWH. This review also discusses the diagnostic and therapeutic potential of ncRNAs in HIV infection and its comorbidities, suggesting their utility as non-invasive biomarkers and targets for modulating neuroinflammatory pathways. Understanding these regulatory roles could pave the way for novel diagnostic strategies and therapeutic interventions in the context of HIV and its comorbidities.
Collapse
Affiliation(s)
| | | | | | | | | | - Shilpa Buch
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE 68198-5880, USA; (S.S.); (U.M.D.); (S.R.); (A.O.); (E.H.)
| | - Palsamy Periyasamy
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE 68198-5880, USA; (S.S.); (U.M.D.); (S.R.); (A.O.); (E.H.)
| |
Collapse
|
17
|
Pulli K, Saarimäki-Vire J, Ahonen P, Liu X, Ibrahim H, Chandra V, Santambrogio A, Wang Y, Vaaralahti K, Iivonen AP, Känsäkoski J, Tommiska J, Kemkem Y, Varjosalo M, Vuoristo S, Andoniadou CL, Otonkoski T, Raivio T. A splice site variant in MADD affects hormone expression in pancreatic β cells and pituitary gonadotropes. JCI Insight 2024; 9:e167598. [PMID: 38775154 PMCID: PMC11141940 DOI: 10.1172/jci.insight.167598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 04/12/2024] [Indexed: 06/02/2024] Open
Abstract
MAPK activating death domain (MADD) is a multifunctional protein regulating small GTPases RAB3 and RAB27, MAPK signaling, and cell survival. Polymorphisms in the MADD locus are associated with glycemic traits, but patients with biallelic variants in MADD manifest a complex syndrome affecting nervous, endocrine, exocrine, and hematological systems. We identified a homozygous splice site variant in MADD in 2 siblings with developmental delay, diabetes, congenital hypogonadotropic hypogonadism, and growth hormone deficiency. This variant led to skipping of exon 30 and in-frame deletion of 36 amino acids. To elucidate how this mutation causes pleiotropic endocrine phenotypes, we generated relevant cellular models with deletion of MADD exon 30 (dex30). We observed reduced numbers of β cells, decreased insulin content, and increased proinsulin-to-insulin ratio in dex30 human embryonic stem cell-derived pancreatic islets. Concordantly, dex30 led to decreased insulin expression in human β cell line EndoC-βH1. Furthermore, dex30 resulted in decreased luteinizing hormone expression in mouse pituitary gonadotrope cell line LβT2 but did not affect ontogeny of stem cell-derived GnRH neurons. Protein-protein interactions of wild-type and dex30 MADD revealed changes affecting multiple signaling pathways, while the GDP/GTP exchange activity of dex30 MADD remained intact. Our results suggest MADD-specific processes regulate hormone expression in pancreatic β cells and pituitary gonadotropes.
Collapse
Affiliation(s)
- Kristiina Pulli
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Jonna Saarimäki-Vire
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Pekka Ahonen
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Xiaonan Liu
- Institute of Biotechnology, Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Hazem Ibrahim
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Vikash Chandra
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Alice Santambrogio
- Centre for Craniofacial and Regenerative Biology, King’s College London, London, United Kingdom
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Yafei Wang
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Kirsi Vaaralahti
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Anna-Pauliina Iivonen
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
| | - Johanna Känsäkoski
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
- Department of Physiology, Faculty of Medicine
| | - Johanna Tommiska
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
- Department of Physiology, Faculty of Medicine
| | - Yasmine Kemkem
- Centre for Craniofacial and Regenerative Biology, King’s College London, London, United Kingdom
| | - Markku Varjosalo
- Institute of Biotechnology, Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sanna Vuoristo
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
- Department of Obstetrics and Gynecology; and
- HiLIFE, University of Helsinki, Helsinki, Finland
| | - Cynthia L. Andoniadou
- Centre for Craniofacial and Regenerative Biology, King’s College London, London, United Kingdom
- Department of Medicine III, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Timo Otonkoski
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
- New Children’s Hospital, Helsinki University Hospital, Pediatric Research Center, Helsinki, Finland
| | - Taneli Raivio
- Stem Cells and Metabolism Research Program (STEMM), Research Programs Unit, Faculty of Medicine, and
- Department of Physiology, Faculty of Medicine
- New Children’s Hospital, Helsinki University Hospital, Pediatric Research Center, Helsinki, Finland
| |
Collapse
|
18
|
McCue K, Burge CB. An interpretable model of pre-mRNA splicing for animal and plant genes. SCIENCE ADVANCES 2024; 10:eadn1547. [PMID: 38718117 PMCID: PMC11078188 DOI: 10.1126/sciadv.adn1547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/04/2024] [Indexed: 05/12/2024]
Abstract
Pre-mRNA splicing is a fundamental step in gene expression, conserved across eukaryotes, in which the spliceosome recognizes motifs at the 3' and 5' splice sites (SSs), excises introns, and ligates exons. SS recognition and pairing is often influenced by protein splicing factors (SFs) that bind to splicing regulatory elements (SREs). Here, we describe SMsplice, a fully interpretable model of pre-mRNA splicing that combines models of core SS motifs, SREs, and exonic and intronic length preferences. We learn models that predict SS locations with 83 to 86% accuracy in fish, insects, and plants and about 70% in mammals. Learned SRE motifs include both known SF binding motifs and unfamiliar motifs, and both motif classes are supported by genetic analyses. Our comparisons across species highlight similarities between non-mammals, increased reliance on intronic SREs in plant splicing, and a greater reliance on SREs in mammalian splicing.
Collapse
Affiliation(s)
- Kayla McCue
- Computational and Systems Biology PhD Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Christopher B. Burge
- Computational and Systems Biology PhD Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
19
|
Recinos Y, Ustianenko D, Yeh YT, Wang X, Jacko M, Yesantharao LV, Wu Q, Zhang C. CRISPR-dCas13d-based deep screening of proximal and distal splicing-regulatory elements. Nat Commun 2024; 15:3839. [PMID: 38714659 PMCID: PMC11076525 DOI: 10.1038/s41467-024-47140-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 03/16/2024] [Indexed: 05/10/2024] Open
Abstract
Pre-mRNA splicing, a key process in gene expression, can be therapeutically modulated using various drug modalities, including antisense oligonucleotides (ASOs). However, determining promising targets is hampered by the challenge of systematically mapping splicing-regulatory elements (SREs) in their native sequence context. Here, we use the catalytically inactive CRISPR-RfxCas13d RNA-targeting system (dCas13d/gRNA) as a programmable platform to bind SREs and modulate splicing by competing against endogenous splicing factors. SpliceRUSH, a high-throughput screening method, was developed to map SREs in any gene of interest using a lentivirus gRNA library that tiles the genetic region, including distal intronic sequences. When applied to SMN2, a therapeutic target for spinal muscular atrophy, SpliceRUSH robustly identifies not only known SREs but also a previously unknown distal intronic SRE, which can be targeted to alter exon 7 splicing using either dCas13d/gRNA or ASOs. This technology enables a deeper understanding of splicing regulation with applications for RNA-based drug discovery.
Collapse
Affiliation(s)
- Yocelyn Recinos
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Dmytro Ustianenko
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
- Flagship Pioneering, Cambridge, MA, 02142, USA
| | - Yow-Tyng Yeh
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Xiaojian Wang
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Martin Jacko
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
- Aperture Therapeutics, Inc., San Carlos, CA, 94070, USA
| | - Lekha V Yesantharao
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
- Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Qiyang Wu
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA
| | - Chaolin Zhang
- Department of Systems Biology, Columbia University, New York, NY, 10032, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA.
| |
Collapse
|
20
|
Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief Bioinform 2024; 25:bbae163. [PMID: 38605640 PMCID: PMC11009468 DOI: 10.1093/bib/bbae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/22/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.
Collapse
Affiliation(s)
- Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhou
- Peng Cheng Laboratory, Shenzhen, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, China
| | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
| |
Collapse
|
21
|
Luthra I, Jensen C, Chen XE, Salaudeen AL, Rafi AM, de Boer CG. Regulatory activity is the default DNA state in eukaryotes. Nat Struct Mol Biol 2024; 31:559-567. [PMID: 38448573 DOI: 10.1038/s41594-024-01235-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Collapse
Affiliation(s)
- Ishika Luthra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassandra Jensen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xinyi E Chen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Asfar Lathif Salaudeen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
22
|
Farrow SL, Gokuladhas S, Schierding W, Pudjihartono M, Perry JK, Cooper AA, O'Sullivan JM. Identification of 27 allele-specific regulatory variants in Parkinson's disease using a massively parallel reporter assay. NPJ Parkinsons Dis 2024; 10:44. [PMID: 38413607 PMCID: PMC10899198 DOI: 10.1038/s41531-024-00659-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/12/2024] [Indexed: 02/29/2024] Open
Abstract
Genome wide association studies (GWAS) have identified a number of genomic loci that are associated with Parkinson's disease (PD) risk. However, the majority of these variants lie in non-coding regions, and thus the mechanisms by which they influence disease development, and/or potential subtypes, remain largely elusive. To address this, we used a massively parallel reporter assay (MPRA) to screen the regulatory function of 5254 variants that have a known or putative connection to PD. We identified 138 loci with enhancer activity, of which 27 exhibited allele-specific regulatory activity in HEK293 cells. The identified regulatory variant(s) typically did not match the original tag variant within the PD associated locus, supporting the need for deeper exploration of these loci. The existence of allele specific transcriptional impacts within HEK293 cells, confirms that at least a subset of the PD associated regions mark functional gene regulatory elements. Future functional studies that confirm the putative targets of the empirically verified regulatory variants will be crucial for gaining a greater understanding of how gene regulatory network(s) modulate PD risk.
Collapse
Affiliation(s)
- Sophie L Farrow
- Liggins Institute, The University of Auckland, Auckland, New Zealand.
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand.
| | | | - William Schierding
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
- Department of Ophthalmology, The University of Auckland, Auckland, New Zealand
| | | | - Jo K Perry
- Liggins Institute, The University of Auckland, Auckland, New Zealand
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Antony A Cooper
- Australian Parkinsons Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Justin M O'Sullivan
- Liggins Institute, The University of Auckland, Auckland, New Zealand.
- The Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand.
- Australian Parkinsons Mission, Garvan Institute of Medical Research, Sydney, NSW, Australia.
- Singapore Institute for Clinical Sciences, Agency for Science Technology and Research, Singapore, Singapore.
- MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton, United Kingdom.
| |
Collapse
|
23
|
Gupta K, Yang C, McCue K, Bastani O, Sharp PA, Burge CB, Solar-Lezama A. Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing. Genome Biol 2024; 25:23. [PMID: 38229106 PMCID: PMC10790492 DOI: 10.1186/s13059-023-03162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/28/2023] [Indexed: 01/18/2024] Open
Abstract
Sequence-specific RNA-binding proteins (RBPs) play central roles in splicing decisions. Here, we describe a modular splicing architecture that leverages in vitro-derived RNA affinity models for 79 human RBPs and the annotated human genome to produce improved models of RBP binding and activity. Binding and activity are modeled by separate Motif and Aggregator components that can be mixed and matched, enforcing sparsity to improve interpretability. Training a new Adjusted Motif (AM) architecture on the splicing task not only yields better splicing predictions but also improves prediction of RBP-binding sites in vivo and of splicing activity, assessed using independent data.
Collapse
Affiliation(s)
- Kavi Gupta
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Chenxi Yang
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA
| | - Kayla McCue
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Osbert Bastani
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Phillip A Sharp
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Armando Solar-Lezama
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
24
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
25
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
26
|
Perchlik M, Sasse A, Mostafavi S, Fields S, Cuperus JT. Impact on splicing in Saccharomyces cerevisiae of random 50-base sequences inserted into an intron. RNA (NEW YORK, N.Y.) 2023; 30:52-67. [PMID: 37879864 PMCID: PMC10726166 DOI: 10.1261/rna.079752.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023]
Abstract
Intron splicing is a key regulatory step in gene expression in eukaryotes. Three sequence elements required for splicing-5' and 3' splice sites and a branchpoint-are especially well-characterized in Saccharomyces cerevisiae, but our understanding of additional intron features that impact splicing in this organism is incomplete, due largely to its small number of introns. To overcome this limitation, we constructed a library in S. cerevisiae of random 50-nt (N50) elements individually inserted into the intron of a reporter gene and quantified canonical splicing and the use of cryptic splice sites by sequencing analysis. More than 70% of approximately 140,000 N50 elements reduced splicing by at least 20%. N50 features, including higher GC content, presence of GU repeats, and stronger predicted secondary structure of its pre-mRNA, correlated with reduced splicing efficiency. A likely basis for the reduced splicing of such a large proportion of variants is the formation of RNA structures that pair N50 bases-such as the GU repeats-with other bases specifically within the reporter pre-mRNA analyzed. However, multiple models were unable to explain more than a small fraction of the variance in splicing efficiency across the library, suggesting that complex nonlinear interactions in RNA structures are not accurately captured by RNA structure prediction methods. Our results imply that the specific context of a pre-mRNA may determine the bases allowable in an intron to prevent secondary structures that reduce splicing. This large data set can serve as a resource for further exploration of splicing mechanisms.
Collapse
Affiliation(s)
- Molly Perchlik
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Alexander Sasse
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Stanley Fields
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
27
|
Stepankiw N, Yang AWH, Hughes TR. The human genome contains over a million autonomous exons. Genome Res 2023; 33:1865-1878. [PMID: 37945377 PMCID: PMC10760453 DOI: 10.1101/gr.277792.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 10/27/2023] [Indexed: 11/12/2023]
Abstract
Mammalian mRNA and lncRNA exons are often small compared to introns. The exon definition model predicts that exons splice autonomously, dependent on proximal exon sequence features, explaining their delineation within large introns. This model has not been examined on a genome-wide scale, however, leaving open the question of how often mRNA and lncRNA exons are autonomous. It is also unknown how frequently such exons can arise by chance. Here, we directly assayed large fragments (500-1000 bp) of the human genome by exon trapping, which detects exons spliced into a heterologous transgene, here designed with a large intron context. We define the trapped exons as "autonomous." We obtained ∼1.25 million trapped exons, including most known mRNA and well-annotated lncRNA internal exons, demonstrating that human exons are predominantly autonomous. mRNA exons are trapped with the highest efficiency. Nearly a million of the trapped exons are unannotated, most located in intergenic regions and antisense to mRNA, with depletion from the forward strand of introns. These exons are not conserved, suggesting they are nonfunctional and arose from random mutations. They are nonetheless highly enriched with known splicing promoting sequence features that delineate known exons. Novel autonomous exons are more numerous than annotated lncRNA exons, and computational models also indicate they will occur with similar frequency in any randomly generated sequence. These results show that most human coding exons splice autonomously, and provide an explanation for the existence of many unconserved lncRNAs, as well as a new annotation and inclusion levels of spliceable loci in the human genome.
Collapse
Affiliation(s)
- Nicholas Stepankiw
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada M5S 3E1;
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
28
|
Rummel CK, Gagliardi M, Ahmad R, Herholt A, Jimenez-Barron L, Murek V, Weigert L, Hausruckinger A, Maidl S, Hauger B, Raabe FJ, Fürle C, Trastulla L, Turecki G, Eder M, Rossner MJ, Ziller MJ. Massively parallel functional dissection of schizophrenia-associated noncoding genetic variants. Cell 2023; 186:5165-5182.e33. [PMID: 37852259 DOI: 10.1016/j.cell.2023.09.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 06/12/2023] [Accepted: 09/14/2023] [Indexed: 10/20/2023]
Abstract
Schizophrenia (SCZ) is a highly heritable mental disorder with thousands of associated genetic variants located mostly in the noncoding space of the genome. Translating these associations into insights regarding the underlying pathomechanisms has been challenging because the causal variants, their mechanisms of action, and their target genes remain largely unknown. We implemented a massively parallel variant annotation pipeline (MVAP) to perform SCZ variant-to-function mapping at scale in disease-relevant neural cell types. This approach identified 620 functional variants (1.7%) that operate in a highly developmental context and neuronal-activity-dependent manner. Multimodal integration of epigenomic and CRISPRi screening data enabled us to link these functional variants to target genes, biological processes, and ultimately alterations of neuronal physiology. These results provide a multistage prioritization strategy to map functional single-nucleotide polymorphism (SNP)-to-gene-to-endophenotype relations and offer biological insights into the context-dependent molecular processes modulated by SCZ-associated genetic variation.
Collapse
Affiliation(s)
- Christine K Rummel
- Max Planck Institute of Psychiatry, Munich 80804, Germany; International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich 80804, Germany
| | - Miriam Gagliardi
- Department of Psychiatry, University of Münster, Münster 48149, Germany
| | - Ruhel Ahmad
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Alexander Herholt
- Department of Psychiatry and Psychotherapy, LMU University Hospital, LMU, Munich 80336, Germany; Systasy Bioscience GmbH, Munich 81669, Germany
| | - Laura Jimenez-Barron
- Max Planck Institute of Psychiatry, Munich 80804, Germany; International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich 80804, Germany
| | - Vanessa Murek
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Liesa Weigert
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | | | - Susanne Maidl
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Barbara Hauger
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Florian J Raabe
- Department of Psychiatry and Psychotherapy, LMU University Hospital, LMU, Munich 80336, Germany
| | | | - Lucia Trastulla
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich 80804, Germany; Department of Psychiatry, University of Münster, Münster 48149, Germany; Technische Universität München Medical Graduate Center Experimental Medicine, Munich 80333, Germany
| | - Gustavo Turecki
- Douglas Mental Health University Institute, Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Matthias Eder
- Max Planck Institute of Psychiatry, Munich 80804, Germany
| | - Moritz J Rossner
- Department of Psychiatry and Psychotherapy, LMU University Hospital, LMU, Munich 80336, Germany; Systasy Bioscience GmbH, Munich 81669, Germany
| | - Michael J Ziller
- Max Planck Institute of Psychiatry, Munich 80804, Germany; Department of Psychiatry, University of Münster, Münster 48149, Germany; Center for Soft Nanoscience, University of Münster, Münster 48149, Germany.
| |
Collapse
|
29
|
Liao SE, Sudarshan M, Regev O. Deciphering RNA splicing logic with interpretable machine learning. Proc Natl Acad Sci U S A 2023; 120:e2221165120. [PMID: 37796983 PMCID: PMC10576025 DOI: 10.1073/pnas.2221165120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 08/29/2023] [Indexed: 10/07/2023] Open
Abstract
Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: Despite their excellent accuracy, they cannot describe how they arrived at their predictions. Here, using an "interpretable-by-design" approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed uncharacterized components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Collapse
Affiliation(s)
- Susan E. Liao
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| | - Oded Regev
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY10012
| |
Collapse
|
30
|
Wang R, Helbig I, Edmondson AC, Lin L, Xing Y. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open
Abstract
Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients.
Collapse
Affiliation(s)
- Robert Wang
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew C Edmondson
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
31
|
Kleinschmidt H, Xu C, Bai L. Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation. Chromosoma 2023; 132:167-189. [PMID: 37184694 PMCID: PMC10542970 DOI: 10.1007/s00412-023-00796-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 04/25/2023] [Accepted: 04/26/2023] [Indexed: 05/16/2023]
Abstract
Despite the recent explosion in genome-wide studies in chromatin and gene regulation, we are still far from extracting a set of genetic rules that can predict the function of the regulatory genome. One major reason for this deficiency is that gene regulation is a multi-layered process that involves an enormous variable space, which cannot be fully explored using native genomes. This problem can be partially solved by introducing synthetic DNA libraries into cells, a method that can test the regulatory roles of thousands to millions of sequences with limited variables. Here, we review recent applications of this method to study transcription factor (TF) binding, nucleosome positioning, and transcriptional activity. We discuss the design principles, experimental procedures, and major findings from these studies and compare the pros and cons of different approaches.
Collapse
Affiliation(s)
- Holly Kleinschmidt
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Cheng Xu
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Lu Bai
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
- Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, 16802, USA.
- Department of Physics, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
32
|
Recinos Y, Ustianenko D, Yeh YT, Wang X, Jacko M, Yesantharao LV, Wu Q, Zhang C. Deep screening of proximal and distal splicing-regulatory elements in a native sequence context. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.21.554109. [PMID: 37662340 PMCID: PMC10473672 DOI: 10.1101/2023.08.21.554109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Pre-mRNA splicing, a key process in gene expression, can be therapeutically modulated using various drug modalities, including antisense oligonucleotides (ASOs). However, determining promising targets is impeded by the challenge of systematically mapping splicing-regulatory elements (SREs) in their native sequence context. Here, we use the catalytically dead CRISPR-RfxCas13d RNA-targeting system (dCas13d/gRNA) as a programmable platform to bind SREs and modulate splicing by competing against endogenous splicing factors. SpliceRUSH, a high-throughput screening method, was developed to map SREs in any gene of interest using a lentivirus gRNA library that tiles the genetic region, including distal intronic sequences. When applied to SMN2, a therapeutic target for spinal muscular atrophy, SpliceRUSH robustly identified not only known SREs, but also a novel distal intronic splicing enhancer, which can be targeted to alter exon 7 splicing using either dCas13d/gRNA or ASOs. This technology enables a deeper understanding of splicing regulation with applications for RNA-based drug discovery.
Collapse
Affiliation(s)
- Yocelyn Recinos
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Dmytro Ustianenko
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
- Present address: Flagship Pioneering, Cambridge, MA 02142, USA
| | - Yow-Tyng Yeh
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Xiaojian Wang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Martin Jacko
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
- Present address: Aperture Therapeutics, Inc., San Carlos, CA 94070, USA
| | - Lekha V. Yesantharao
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
- Present address: Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Qiyang Wu
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| | - Chaolin Zhang
- Department of Systems Biology, Department of Biochemistry and Molecular Biophysics, Center for Motor Neuron Biology and Disease, Columbia University, New York, NY 10032, USA
| |
Collapse
|
33
|
Gosai SJ, Castro RI, Fuentes N, Butts JC, Kales S, Noche RR, Mouri K, Sabeti PC, Reilly SK, Tewhey R. Machine-guided design of synthetic cell type-specific cis-regulatory elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.08.552077. [PMID: 37609287 PMCID: PMC10441439 DOI: 10.1101/2023.08.08.552077] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing, and stimulus responses, which collectively define the thousands of unique cell types in the body. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for an intended purpose has arisen naturally through evolution. Here, we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell type specificity. We leverage innovations in deep neural network modeling of CRE activity across three cell types, efficient in silico optimization, and massively parallel reporter assays (MPRAs) to design and empirically test thousands of CREs. Through in vitro and in vivo validation, we show that synthetic sequences outperform natural sequences from the human genome in driving cell type-specific expression. Synthetic sequences leverage unique sequence syntax to promote activity in the on-target cell type and simultaneously reduce activity in off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs and demonstrate the required literacy to write regulatory code that is fit-for-purpose in vivo across vertebrates.
Collapse
Affiliation(s)
- SJ Gosai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Graduate Program in Biological and Biomedical Science, Boston MA
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - RI Castro
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - N Fuentes
- The Jackson Laboratory, Bar Harbor, ME, USA
- Harvard College, Harvard University, Cambridge, MA, USA
| | - JC Butts
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
| | - S Kales
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - RR Noche
- Department of Comparative Medicine, Yale School of Medicine, New Haven, CT, USA
- Yale Zebrafish Research Core, Yale School of Medicine, New Haven, CT, USA
| | - K Mouri
- The Jackson Laboratory, Bar Harbor, ME, USA
| | - PC Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department Of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - SK Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Wu Tsai Institute, Yale University, New Haven, CT, USA
| | - R Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA, USA
| |
Collapse
|
34
|
Pfaff AL, Bubb VJ, Quinn JP, Koks S. A Genome-Wide Screen for the Exonisation of Reference SINE-VNTR-Alus and Their Expression in CNS Tissues of Individuals with Amyotrophic Lateral Sclerosis. Int J Mol Sci 2023; 24:11548. [PMID: 37511314 PMCID: PMC10380656 DOI: 10.3390/ijms241411548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/10/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023] Open
Abstract
The hominid-specific retrotransposon SINE-VNTR-Alu (SVA) is a composite element that has contributed to the genetic variation between individuals and influenced genomic structure and function. SVAs are involved in modulating gene expression and splicing patterns, altering mRNA levels and sequences, and have been associated with the development of disease. We evaluated the genome-wide effects of SVAs present in the reference genome on transcript sequence and expression in the CNS of individuals with and without the neurodegenerative disorder Amyotrophic Lateral Sclerosis (ALS). This study identified SVAs in the exons of 179 known transcripts, several of which were expressed in a tissue-specific manner, as well as 92 novel exonisation events occurring in the motor cortex. An analysis of 65 reference genome SVAs polymorphic for their presence/absence in the ALS consortium cohort did not identify any elements that were significantly associated with disease status, age at onset, and survival. However, there were transcripts, such as transferrin and HLA-A, that were differentially expressed between those with or without disease, and expression levels were associated with the genotype of proximal SVAs. This study demonstrates the functional consequences of several SVA elements altering mRNA splicing patterns and expression levels in tissues of the CNS.
Collapse
Affiliation(s)
- Abigail L Pfaff
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA 6150, Australia
| | - Vivien J Bubb
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - John P Quinn
- Department of Pharmacology and Therapeutics, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK
| | - Sulev Koks
- Perron Institute for Neurological and Translational Science, Perth, WA 6009, Australia
- Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Perth, WA 6150, Australia
| |
Collapse
|
35
|
Kim J, Woo S, de Gusmao CM, Zhao B, Chin DH, DiDonato RL, Nguyen MA, Nakayama T, Hu CA, Soucy A, Kuniholm A, Thornton JK, Riccardi O, Friedman DA, El Achkar CM, Dash Z, Cornelissen L, Donado C, Faour KNW, Bush LW, Suslovitch V, Lentucci C, Park PJ, Lee EA, Patterson A, Philippakis AA, Margus B, Berde CB, Yu TW. A framework for individualized splice-switching oligonucleotide therapy. Nature 2023; 619:828-836. [PMID: 37438524 PMCID: PMC10371869 DOI: 10.1038/s41586-023-06277-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/25/2023] [Indexed: 07/14/2023]
Abstract
Splice-switching antisense oligonucleotides (ASOs) could be used to treat a subset of individuals with genetic diseases1, but the systematic identification of such individuals remains a challenge. Here we performed whole-genome sequencing analyses to characterize genetic variation in 235 individuals (from 209 families) with ataxia-telangiectasia, a severely debilitating and life-threatening recessive genetic disorder2,3, yielding a complete molecular diagnosis in almost all individuals. We developed a predictive taxonomy to assess the amenability of each individual to splice-switching ASO intervention; 9% and 6% of the individuals had variants that were 'probably' or 'possibly' amenable to ASO splice modulation, respectively. Most amenable variants were in deep intronic regions that are inaccessible to exon-targeted sequencing. We developed ASOs that successfully rescued mis-splicing and ATM cellular signalling in patient fibroblasts for two recurrent variants. In a pilot clinical study, one of these ASOs was used to treat a child who had been diagnosed with ataxia-telangiectasia soon after birth, and showed good tolerability without serious adverse events for three years. Our study provides a framework for the prospective identification of individuals with genetic diseases who might benefit from a therapeutic approach involving splice-switching ASOs.
Collapse
Affiliation(s)
- Jinkuk Kim
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
- Biomedical Research Center, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
- KI for Health Science and Technology, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
- Center for Epidemic Preparedness, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
| | - Sijae Woo
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Claudio M de Gusmao
- Department of Neurology, Boston Children's Hospital, Boston, MA, USA
- Postgraduate School of Medical Science, University of Campinas (UNICAMP), São Paulo, Brazil
| | - Boxun Zhao
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Diana H Chin
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Renata L DiDonato
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Minh A Nguyen
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Tojo Nakayama
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Chunguang April Hu
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Aubrie Soucy
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Ashley Kuniholm
- Institutional Center for Clinical and Translational Research, Boston Children's Hospital, Boston, MA, USA
| | | | - Olivia Riccardi
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Danielle A Friedman
- Department of Neurology, Boston Children's Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | | | - Zane Dash
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Laura Cornelissen
- Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Carolina Donado
- Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Kamli N W Faour
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Lynn W Bush
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA
- Center for Bioethics, Harvard Medical School, Boston, MA, USA
| | - Victoria Suslovitch
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Claudia Lentucci
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Al Patterson
- Harvard Medical School, Boston, MA, USA
- Department of Pharmacy, Boston Children's Hospital, Boston, MA, USA
| | - Anthony A Philippakis
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brad Margus
- Ataxia Telangiectasia Children's Project, Coconut Creek, FL, USA
| | - Charles B Berde
- Harvard Medical School, Boston, MA, USA
- Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Timothy W Yu
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.
- Manton Center for Orphan Disease Research, Boston Children's Hospital, Boston, MA, USA.
- Department of Pediatrics, Boston Children's Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
36
|
Rummel CK, Gagliardi M, Herholt A, Ahmad R, Murek V, Weigert L, Hausruckinger A, Maidl S, Jimenez-Barron L, Trastulla L, Eder M, Rossner M, Ziller MJ. Cell type and condition specific functional annotation of schizophrenia associated non-coding genetic variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.27.545266. [PMID: 37425902 PMCID: PMC10326990 DOI: 10.1101/2023.06.27.545266] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Schizophrenia (SCZ) is a highly polygenic disease and genome wide association studies have identified thousands of genetic variants that are statistically associated with this psychiatric disorder. However, our ability to translate these associations into insights on the disease mechanisms has been challenging since the causal genetic variants, their molecular function and their target genes remain largely unknown. In order to address these questions, we established a functional genomics pipeline in combination with induced pluripotent stem cell technology to functionally characterize ~35,000 non-coding genetic variants associated with schizophrenia along with their target genes. This analysis identified a set of 620 (1.7%) single nucleotide polymorphisms as functional on a molecular level in a highly cell type and condition specific fashion. These results provide a high-resolution map of functional variant-gene combinations and offer comprehensive biological insights into the developmental context and stimulation dependent molecular processes modulated by SCZ associated genetic variation.
Collapse
Affiliation(s)
- Christine K. Rummel
- Max Planck Institute of Psychiatry, Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | - Miriam Gagliardi
- Department of Psychiatry, University of Münster, Münster, Germany
| | - Alexander Herholt
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Ruhel Ahmad
- Max Planck Institute of Psychiatry, Munich, Germany
| | | | | | | | | | - Laura Jimenez-Barron
- Max Planck Institute of Psychiatry, Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich, Germany
| | - Lucia Trastulla
- Department of Psychiatry, University of Münster, Münster, Germany
| | - Mathias Eder
- Max Planck Institute of Psychiatry, Munich, Germany
| | - Moritz Rossner
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Michael J. Ziller
- Max Planck Institute of Psychiatry, Munich, Germany
- Department of Psychiatry, University of Münster, Münster, Germany
- Center for Soft Nanoscience, University of Münster, Münster, Germany
| |
Collapse
|
37
|
Zabardast A, Tamer EG, Son YA, Yılmaz A. An automated framework for evaluation of deep learning models for splice site predictions. Sci Rep 2023; 13:10221. [PMID: 37353532 PMCID: PMC10290104 DOI: 10.1038/s41598-023-34795-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 05/08/2023] [Indexed: 06/25/2023] Open
Abstract
A novel framework for the automated evaluation of various deep learning-based splice site detectors is presented. The framework eliminates time-consuming development and experimenting activities for different codebases, architectures, and configurations to obtain the best models for a given RNA splice site dataset. RNA splicing is a cellular process in which pre-mRNAs are processed into mature mRNAs and used to produce multiple mRNA transcripts from a single gene sequence. Since the advancement of sequencing technologies, many splice site variants have been identified and associated with the diseases. So, RNA splice site prediction is essential for gene finding, genome annotation, disease-causing variants, and identification of potential biomarkers. Recently, deep learning models performed highly accurately for classifying genomic signals. Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and its bidirectional version (BLSTM), Gated Recurrent Unit (GRU), and its bidirectional version (BGRU) are promising models. During genomic data analysis, CNN's locality feature helps where each nucleotide correlates with other bases in its vicinity. In contrast, BLSTM can be trained bidirectionally, allowing sequential data to be processed from forward and reverse directions. Therefore, it can process 1-D encoded genomic data effectively. Even though both methods have been used in the literature, a performance comparison was missing. To compare selected models under similar conditions, we have created a blueprint for a series of networks with five different levels. As a case study, we compared CNN and BLSTM models' learning capabilities as building blocks for RNA splice site prediction in two different datasets. Overall, CNN performed better with [Formula: see text] accuracy ([Formula: see text] improvement), [Formula: see text] F1 score ([Formula: see text] improvement), and [Formula: see text] AUC-PR ([Formula: see text] improvement) in human splice site prediction. Likewise, an outperforming performance with [Formula: see text] accuracy ([Formula: see text] improvement), [Formula: see text] F1 score ([Formula: see text] improvement), and [Formula: see text] AUC-PR ([Formula: see text] improvement) is achieved in C. elegans splice site prediction. Overall, our results showed that CNN learns faster than BLSTM and BGRU. Moreover, CNN performs better at extracting sequence patterns than BLSTM and BGRU. To our knowledge, no other framework is developed explicitly for evaluating splice detection models to decide the best possible model in an automated manner. So, the proposed framework and the blueprint would help selecting different deep learning models, such as CNN vs. BLSTM and BGRU, for splice site analysis or similar classification tasks and in different problems.
Collapse
Affiliation(s)
- Amin Zabardast
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Elif Güney Tamer
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Yeşim Aydın Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Arif Yılmaz
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
38
|
Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, Meyerson M, Evans BJ, Fairbrother WG. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A 2023; 120:e2218308120. [PMID: 37192163 PMCID: PMC10214146 DOI: 10.1073/pnas.2218308120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/12/2023] [Indexed: 05/18/2023] Open
Abstract
Humans coexisted and interbred with other hominins which later became extinct. These archaic hominins are known to us only through fossil records and for two cases, genome sequences. Here, we engineer Neanderthal and Denisovan sequences into thousands of artificial genes to reconstruct the pre-mRNA processing patterns of these extinct populations. Of the 5,169 alleles tested in this massively parallel splicing reporter assay (MaPSy), we report 962 exonic splicing mutations that correspond to differences in exon recognition between extant and extinct hominins. Using MaPSy splicing variants, predicted splicing variants, and splicing quantitative trait loci, we show that splice-disrupting variants experienced greater purifying selection in anatomically modern humans than that in Neanderthals. Adaptively introgressed variants were enriched for moderate-effect splicing variants, consistent with positive selection for alternative spliced alleles following introgression. As particularly compelling examples, we characterized a unique tissue-specific alternative splicing variant at the adaptively introgressed innate immunity gene TLR1, as well as a unique Neanderthal introgressed alternative splicing variant in the gene HSPG2 that encodes perlecan. We further identified potentially pathogenic splicing variants found only in Neanderthals and Denisovans in genes related to sperm maturation and immunity. Finally, we found splicing variants that may contribute to variation among modern humans in total bilirubin, balding, hemoglobin levels, and lung capacity. Our findings provide unique insights into natural selection acting on splicing in human evolution and demonstrate how functional assays can be used to identify candidate causal variants underlying differences in gene regulation and phenotype.
Collapse
Affiliation(s)
- Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Christopher R. Neil
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Anastasia Welch
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Chaorui Duan
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Samantha Maguire
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ijeoma C. Meremikwu
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Malcolm Meyerson
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
| | - Ben J. Evans
- Department of Biology, McMaster University, Hamilton, ONL8S 4K1, Canada
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, RI02912
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI02912
- Hassenfeld Child Health Innovation Institute of Brown University, Providence, RI02912
| |
Collapse
|
39
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539398. [PMID: 37205456 PMCID: PMC10187268 DOI: 10.1101/2023.05.04.539398] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. Results We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. Conclusion SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jacob O. Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
40
|
Wagner N, Çelik MH, Hölzlwimmer FR, Mertes C, Prokisch H, Yépez VA, Gagneur J. Aberrant splicing prediction across human tissues. Nat Genet 2023; 55:861-870. [PMID: 37142848 DOI: 10.1038/s41588-023-01373-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 03/14/2023] [Indexed: 05/06/2023]
Abstract
Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.
Collapse
Affiliation(s)
- Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Muhammed H Çelik
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
| | - Florian R Hölzlwimmer
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Christian Mertes
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Vicente A Yépez
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
41
|
Ezoe A, Iuchi S, Sakurai T, Aso Y, Tokunaga H, Vu AT, Utsumi Y, Takahashi S, Tanaka M, Ishida J, Ishitani M, Seki M. Fully sequencing the cassava full-length cDNA library reveals unannotated transcript structures and alternative splicing events in regions with a high density of single nucleotide variations, insertions-deletions, and heterozygous sequences. PLANT MOLECULAR BIOLOGY 2023; 112:33-45. [PMID: 37014509 DOI: 10.1007/s11103-023-01346-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 02/27/2023] [Indexed: 05/09/2023]
Abstract
The primary transcript structure provides critical insights into protein diversity, transcriptional modification, and functions. Cassava transcript structures are highly diverse because of alternative splicing (AS) events and high heterozygosity. To precisely determine and characterize transcript structures, fully sequencing cloned transcripts is the most reliable method. However, cassava annotations were mainly determined according to fragmentation-based sequencing analyses (e.g., EST and short-read RNA-seq). In this study, we sequenced the cassava full-length cDNA library, which included rare transcripts. We obtained 8,628 non-redundant fully sequenced transcripts and detected 615 unannotated AS events and 421 unannotated loci. The different protein sequences resulting from the unannotated AS events tended to have diverse functional domains, implying that unannotated AS contributes to the truncation of functional domains. The unannotated loci tended to be derived from orphan genes, implying that the loci may be associated with cassava-specific traits. Unexpectedly, individual cassava transcripts were more likely to have multiple AS events than Arabidopsis transcripts, suggestive of the regulated interactions between cassava splicing-related complexes. We also observed that the unannotated loci and/or AS events were commonly in regions with abundant single nucleotide variations, insertions-deletions, and heterozygous sequences. These findings reflect the utility of completely sequenced FLcDNA clones for overcoming cassava-specific annotation-related problems to elucidate transcript structures. Our work provides researchers with transcript structural details that are useful for annotating highly diverse and unique transcripts and alternative splicing events.
Collapse
Affiliation(s)
- Akihiro Ezoe
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
| | - Satoshi Iuchi
- Experimental Plant Division, RIKEN BioResource Research Center, Tsukuba, Ibaraki, 305-0074, Japan
| | - Tetsuya Sakurai
- Multidisciplinary Science Cluster, Interdisciplinary Science Unit, Kochi University, Nankoku, Kochi, 783-8502, Japan
| | - Yukie Aso
- Experimental Plant Division, RIKEN BioResource Research Center, Tsukuba, Ibaraki, 305-0074, Japan
| | - Hiroki Tokunaga
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Tropical Agriculture Research Front, Japan International Research Center for Agricultural Sciences, Ishigaki, Okinawa, 907-0002, Japan
| | - Anh Thu Vu
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
| | - Yoshinori Utsumi
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
| | - Satoshi Takahashi
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Maho Tanaka
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Junko Ishida
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Manabu Ishitani
- International Center for Tropical Agriculture (CIAT), Km 17, Recta Cali-Palmira Apartado Aéreo 6713, Cali, Colombia
| | - Motoaki Seki
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan.
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka-cho, Totsuka-ku, Yokohama, Kanagawa, 244-0813, Japan.
| |
Collapse
|
42
|
Arunima A, van Schaik EJ, Samuel JE. The emerging roles of long non-coding RNA in host immune response and intracellular bacterial infections. Front Cell Infect Microbiol 2023; 13:1160198. [PMID: 37153158 PMCID: PMC10160451 DOI: 10.3389/fcimb.2023.1160198] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 04/07/2023] [Indexed: 05/09/2023] Open
Abstract
The long non-coding RNAs (lncRNAs) are evolutionarily conserved classes of non-coding regulatory transcripts of > 200 nucleotides in length. They modulate several transcriptional and post-transcriptional events in the organism. Depending on their cellular localization and interactions, they regulate chromatin function and assembly; and alter the stability and translation of cytoplasmic mRNAs. Although their proposed range of functionality remains controversial, there is increasing research evidence that lncRNAs play a regulatory role in the activation, differentiation and development of immune signaling cascades; microbiome development; and in diseases such as neuronal and cardiovascular disorders; cancer; and pathogenic infections. This review discusses the functional roles of different lncRNAs in regulation of host immune responses, signaling pathways during host-microbe interaction and infection caused by obligate intracellular bacterial pathogens. The study of lncRNAs is assuming significance as it could be exploited for development of alternative therapeutic strategies for the treatment of severe and chronic pathogenic infections caused by Mycobacterium, Chlamydia and Rickettsia infections, as well as commensal colonization. Finally, this review summarizes the translational potential of lncRNA research in development of diagnostic and prognostic tools for human diseases.
Collapse
Affiliation(s)
| | | | - James E. Samuel
- Department of Microbial Pathogenesis and Immunology, School of Medicine, Texas A&M University, Bryan, TX, United States
| |
Collapse
|
43
|
Rogalska ME, Vivori C, Valcárcel J. Regulation of pre-mRNA splicing: roles in physiology and disease, and therapeutic prospects. Nat Rev Genet 2023; 24:251-269. [PMID: 36526860 DOI: 10.1038/s41576-022-00556-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2022] [Indexed: 12/23/2022]
Abstract
The removal of introns from mRNA precursors and its regulation by alternative splicing are key for eukaryotic gene expression and cellular function, as evidenced by the numerous pathologies induced or modified by splicing alterations. Major recent advances have been made in understanding the structures and functions of the splicing machinery, in the description and classification of physiological and pathological isoforms and in the development of the first therapies for genetic diseases based on modulation of splicing. Here, we review this progress and discuss important remaining challenges, including predicting splice sites from genomic sequences, understanding the variety of molecular mechanisms and logic of splicing regulation, and harnessing this knowledge for probing gene function and disease aetiology and for the design of novel therapeutic approaches.
Collapse
Affiliation(s)
- Malgorzata Ewa Rogalska
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Vivori
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
- The Francis Crick Institute, London, UK
| | - Juan Valcárcel
- Genome Biology Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
44
|
García-Ruiz S, Zhang D, Gustavsson EK, Rocamora-Perez G, Grant-Peters M, Fairbrother-Browne A, Reynolds RH, Brenton JW, Gil-Martínez AL, Chen Z, Rio DC, Botia JA, Guelfi S, Collado-Torres L, Ryten M. Splicing accuracy varies across human introns, tissues and age. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.29.534370. [PMID: 37034741 PMCID: PMC10081249 DOI: 10.1101/2023.03.29.534370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples and 42 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that mis-splicing occurs at different rates across introns and tissues and that these splicing inaccuracies are primarily affected by the abundance of core components of the spliceosome assembly and its regulators. Using publicly available data on short-hairpin RNA-knockdowns of numerous spliceosomal components and related regulators, we found support for the importance of RNA-binding proteins in mis-splicing. We also demonstrated that age is positively correlated with mis-splicing, and it affects genes implicated in neurodegenerative diseases. This in-depth characterisation of mis-splicing can have important implications for our understanding of the role of splicing inaccuracies in human disease and the interpretation of long-read RNA-sequencing data.
Collapse
Affiliation(s)
- S García-Ruiz
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - D Zhang
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
| | - E K Gustavsson
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - G Rocamora-Perez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
| | - M Grant-Peters
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - A Fairbrother-Browne
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - R H Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - J W Brenton
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - A L Gil-Martínez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Z Chen
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - D C Rio
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA 94720, USA
| | - J A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - S Guelfi
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Verge Genomics, South San Francisco, CA, 94080, USA
| | - L Collado-Torres
- Lieber Institute for Brain Development, Baltimore, MD, USA , 21205
| | - M Ryten
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| |
Collapse
|
45
|
Gallego Romero I, Lea AJ. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol 2023; 24:26. [PMID: 36788564 PMCID: PMC9926830 DOI: 10.1186/s13059-023-02856-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 01/17/2023] [Indexed: 02/16/2023] Open
Abstract
A long-standing goal of evolutionary biology is to decode how gene regulation contributes to organismal diversity. Doing so is challenging because it is hard to predict function from non-coding sequence and to perform molecular research with non-model taxa. Massively parallel reporter assays (MPRAs) enable the testing of thousands to millions of sequences for regulatory activity simultaneously. Here, we discuss the execution, advantages, and limitations of MPRAs, with a focus on evolutionary questions. We propose solutions for extending MPRAs to rare taxa and those with limited genomic resources, and we underscore MPRA's broad potential for driving genome-scale, functional studies across organisms.
Collapse
Affiliation(s)
- Irene Gallego Romero
- Melbourne Integrative Genomics, University of Melbourne, Royal Parade, Parkville, Victoria, 3010, Australia. .,School of BioSciences, The University of Melbourne, Royal Parade, Parkville, 3010, Australia. .,The Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, 30 Royal Parade, Parkville, Victoria, 3010, Australia. .,Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Riia 23b, 51010, Tartu, Estonia.
| | - Amanda J. Lea
- grid.152326.10000 0001 2264 7217Department of Biological Sciences, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37240 USA ,grid.152326.10000 0001 2264 7217Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37240 USA ,Child and Brain Development Program, Canadian Institute for Advanced Study, Toronto, Canada
| |
Collapse
|
46
|
Horn T, Gosliga A, Li C, Enculescu M, Legewie S. Position-dependent effects of RNA-binding proteins in the context of co-transcriptional splicing. NPJ Syst Biol Appl 2023; 9:1. [PMID: 36653378 PMCID: PMC9849329 DOI: 10.1038/s41540-022-00264-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 12/08/2022] [Indexed: 01/19/2023] Open
Abstract
Alternative splicing is an important step in eukaryotic mRNA pre-processing which increases the complexity of gene expression programs, but is frequently altered in disease. Previous work on the regulation of alternative splicing has demonstrated that splicing is controlled by RNA-binding proteins (RBPs) and by epigenetic DNA/histone modifications which affect splicing by changing the speed of polymerase-mediated pre-mRNA transcription. The interplay of these different layers of splicing regulation is poorly understood. In this paper, we derived mathematical models describing how splicing decisions in a three-exon gene are made by combinatorial spliceosome binding to splice sites during ongoing transcription. We additionally take into account the effect of a regulatory RBP and find that the RBP binding position within the sequence is a key determinant of how RNA polymerase velocity affects splicing. Based on these results, we explain paradoxical observations in the experimental literature and further derive rules explaining why the same RBP can act as inhibitor or activator of cassette exon inclusion depending on its binding position. Finally, we derive a stochastic description of co-transcriptional splicing regulation at the single-cell level and show that splicing outcomes show little noise and follow a binomial distribution despite complex regulation by a multitude of factors. Taken together, our simulations demonstrate the robustness of splicing outcomes and reveal that quantitative insights into kinetic competition of co-transcriptional events are required to fully understand this important mechanism of gene expression diversity.
Collapse
Affiliation(s)
- Timur Horn
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Alison Gosliga
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
- University of Stuttgart, Department of Systems Biology and Stuttgart Research Center Systems Biology (SRCSB), Allmandring 31, 70569, Stuttgart, Germany
| | - Congxin Li
- University of Stuttgart, Department of Systems Biology and Stuttgart Research Center Systems Biology (SRCSB), Allmandring 31, 70569, Stuttgart, Germany
| | - Mihaela Enculescu
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
| | - Stefan Legewie
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
- University of Stuttgart, Department of Systems Biology and Stuttgart Research Center Systems Biology (SRCSB), Allmandring 31, 70569, Stuttgart, Germany.
| |
Collapse
|
47
|
Аpplication of massive parallel reporter analysis in biotechnology and medicine. КЛИНИЧЕСКАЯ ПРАКТИКА 2023. [DOI: 10.17816/clinpract115063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The development and functioning of an organism relies on tissue-specific gene programs. Genome regulatory elements play a key role in the regulation of such programs, and disruptions in their function can lead to the development of various pathologies, including cancers, malformations and autoimmune diseases. The emergence of high-throughput genomic studies has led to massively parallel reporter analysis (MPRA) methods, which allow the functional verification and identification of regulatory elements on a genome-wide scale. Initially MPRA was used as a tool to investigate fundamental aspects of epigenetics, but the approach also has great potential for clinical and practical biotechnology. Currently, MPRA is used for validation of clinically significant mutations, identification of tissue-specific regulatory elements, search for the most promising loci for transgene integration, and is an indispensable tool for creating highly efficient expression systems, the range of application of which extends from approaches for protein development and design of next-generation therapeutic antibody superproducers to gene therapy. In this review, the main principles and areas of practical application of high-throughput reporter assays will be discussed.
Collapse
|
48
|
Apostolidi M, Stamatopoulou V. Aberrant splicing in human cancer: An RNA structural code point of view. Front Pharmacol 2023; 14:1137154. [PMID: 36909167 PMCID: PMC9995731 DOI: 10.3389/fphar.2023.1137154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 02/14/2023] [Indexed: 02/25/2023] Open
Abstract
Alternative splicing represents an essential process that occurs widely in eukaryotes. In humans, most genes undergo alternative splicing to ensure transcriptome and proteome diversity reflecting their functional complexity. Over the last decade, aberrantly spliced transcripts due to mutations in cis- or trans-acting splicing regulators have been tightly associated with cancer development, largely drawing scientific attention. Although a plethora of single proteins, ribonucleoproteins, complexed RNAs, and short RNA sequences have emerged as nodal contributors to the splicing cascade, the role of RNA secondary structures in warranting splicing fidelity has been underestimated. Recent studies have leveraged the establishment of novel high-throughput methodologies and bioinformatic tools to shed light on an additional layer of splicing regulation in the context of RNA structural elements. This short review focuses on the most recent available data on splicing mechanism regulation on the basis of RNA secondary structure, emphasizing the importance of the complex RNA G-quadruplex structures (rG4s), and other specific RNA motifs identified as splicing silencers or enhancers. Moreover, it intends to provide knowledge on newly established techniques that allow the identification of RNA structural elements and highlight the potential to develop new RNA-oriented therapeutic strategies against cancer.
Collapse
Affiliation(s)
- Maria Apostolidi
- Agilent Laboratories, Agilent Technologies, Santa Clara, CA, United States
| | | |
Collapse
|
49
|
Barbosa P, Savisaar R, Carmo-Fonseca M, Fonseca A. Computational prediction of human deep intronic variation. Gigascience 2022; 12:giad085. [PMID: 37878682 PMCID: PMC10599398 DOI: 10.1093/gigascience/giad085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/07/2023] [Accepted: 09/20/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND The adoption of whole-genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to discriminate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce. RESULTS In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that potentially affect splicing regulatory elements. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground - information, but the use of these tools results in decreased predictive power when compared to black box methods. CONCLUSIONS Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.
Collapse
Affiliation(s)
- Pedro Barbosa
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | | | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028, Lisboa, Portugal
| | - Alcides Fonseca
- LASIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016,, Lisboa, Portugal
| |
Collapse
|
50
|
Leman R, Parfait B, Vidaud D, Girodon E, Pacot L, Le Gac G, Ka C, Ferec C, Fichou Y, Quesnelle C, Aucouturier C, Muller E, Vaur D, Castera L, Boulouard F, Ricou A, Tubeuf H, Soukarieh O, Gaildrat P, Riant F, Guillaud‐Bataille M, Caputo SM, Caux‐Moncoutier V, Boutry‐Kryza N, Bonnet‐Dorion F, Schultz I, Rossing M, Quenez O, Goldenberg L, Harter V, Parsons MT, Spurdle AB, Frébourg T, Martins A, Houdayer C, Krieger S. SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing. Hum Mutat 2022; 43:2308-2323. [PMID: 36273432 PMCID: PMC10946553 DOI: 10.1002/humu.24491] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 10/06/2022] [Accepted: 10/18/2022] [Indexed: 01/25/2023]
Abstract
Modeling splicing is essential for tackling the challenge of variant interpretation as each nucleotide variation can be pathogenic by affecting pre-mRNA splicing via disruption/creation of splicing motifs such as 5'/3' splice sites, branch sites, or splicing regulatory elements. Unfortunately, most in silico tools focus on a specific type of splicing motif, which is why we developed the Splicing Prediction Pipeline (SPiP) to perform, in one single bioinformatic analysis based on a machine learning approach, a comprehensive assessment of the variant effect on different splicing motifs. We gathered a curated set of 4616 variants scattered all along the sequence of 227 genes, with their corresponding splicing studies. The Bayesian analysis provided us with the number of control variants, that is, variants without impact on splicing, to mimic the deluge of variants from high-throughput sequencing data. Results show that SPiP can deal with the diversity of splicing alterations, with 83.13% sensitivity and 99% specificity to detect spliceogenic variants. Overall performance as measured by area under the receiving operator curve was 0.986, better than SpliceAI and SQUIRLS (0.965 and 0.766) for the same data set. SPiP lends itself to a unique suite for comprehensive prediction of spliceogenicity in the genomic medicine era. SPiP is available at: https://sourceforge.net/projects/splicing-prediction-pipeline/.
Collapse
Affiliation(s)
- Raphaël Leman
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- UNICAENNormandie UniversitéCaenFrance
| | - Béatrice Parfait
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Dominique Vidaud
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Emmanuelle Girodon
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Laurence Pacot
- Service de Génétique et Biologie Moléculaires, APHP, HUPCHôpital CochinParisFrance
| | - Gérald Le Gac
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Chandran Ka
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Claude Ferec
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Yann Fichou
- Inserm UMR1078, Genetics, Functional Genomics and BiotechnologyUniversité de Bretagne OccidentaleBrestFrance
| | - Céline Quesnelle
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
| | - Camille Aucouturier
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Etienne Muller
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
| | - Dominique Vaur
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Laurent Castera
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Flavie Boulouard
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Agathe Ricou
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Hélène Tubeuf
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- Integrative BiosoftwareRouenFrance
| | - Omar Soukarieh
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | | | - Florence Riant
- Laboratoire de Génétique, AP‐HPGH Saint‐Louis‐Lariboisière‐Fernand WidalParisFrance
| | | | - Sandrine M. Caputo
- Department of Genetics, Institut CurieParis Sciences Lettres Research UniversityParisFrance
| | | | - Nadia Boutry‐Kryza
- Unité Mixte de Génétique Constitutionnelle des Cancers FréquentsHospices Civils de LyonLyonFrance
| | - Françoise Bonnet‐Dorion
- Departement de Biopathologie Unité de Génétique ConstitutionnelleInstitut Bergonie—INSERM U1218BordeauxFrance
| | - Ines Schultz
- Laboratoire d'OncogénétiqueCentre Paul StraussStrasbourgFrance
| | - Maria Rossing
- Centre for Genomic Medicine, RigshospitaletUniversity of CopenhagenCopenhagenDenmark
| | - Olivier Quenez
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Louis Goldenberg
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Valentin Harter
- Department of BiostatisticsBaclesse Unicancer CenterCaenFrance
| | - Michael T. Parsons
- Department of Genetics and Computational BiologyQIMR Berghofer Medical Research InstituteHerstonQueenslandAustralia
| | - Amanda B. Spurdle
- Department of Genetics and Computational BiologyQIMR Berghofer Medical Research InstituteHerstonQueenslandAustralia
| | - Thierry Frébourg
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- Department of geneticsRouen University HospitalRouenFrance
| | - Alexandra Martins
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
| | - Claude Houdayer
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- Department of geneticsRouen University HospitalRouenFrance
| | - Sophie Krieger
- Laboratoire de Biologie et Génétique du CancerCentre François BaclesseCaenFrance
- Inserm U1245, UNIROUEN, FHU‐G4 génomiqueNormandie UniversitéRouenFrance
- UNICAENNormandie UniversitéCaenFrance
| |
Collapse
|