1
|
Nicolas-Martinez EC, Robinson O, Pflueger C, Gardner A, Corbett MA, Ritchie T, Kroes T, van Eyk CL, Scheffer IE, Hildebrand MS, Barnier JV, Rousseau V, Genevieve D, Haushalter V, Piton A, Denommé-Pichon AS, Bruel AL, Nambot S, Isidor B, Grigg J, Gonzalez T, Ghedia S, Marchant RG, Bournazos A, Wong WK, Webster RI, Evesson FJ, Jones KJ, Cooper ST, Lister R, Gecz J, Jolly LA. RNA variant assessment using transactivation and transdifferentiation. Am J Hum Genet 2024; 111:1673-1699. [PMID: 39084224 DOI: 10.1016/j.ajhg.2024.06.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 06/27/2024] [Accepted: 06/28/2024] [Indexed: 08/02/2024] Open
Abstract
Understanding the impact of splicing and nonsense variants on RNA is crucial for the resolution of variant classification as well as their suitability for precision medicine interventions. This is primarily enabled through RNA studies involving transcriptomics followed by targeted assays using RNA isolated from clinically accessible tissues (CATs) such as blood or skin of affected individuals. Insufficient disease gene expression in CATs does however pose a major barrier to RNA based investigations, which we show is relevant to 1,436 Mendelian disease genes. We term these "silent" Mendelian genes (SMGs), the largest portion (36%) of which are associated with neurological disorders. We developed two approaches to induce SMG expression in human dermal fibroblasts (HDFs) to overcome this limitation, including CRISPR-activation-based gene transactivation and fibroblast-to-neuron transdifferentiation. Initial transactivation screens involving 40 SMGs stimulated our development of a highly multiplexed transactivation system culminating in the 6- to 90,000-fold induction of expression of 20/20 (100%) SMGs tested in HDFs. Transdifferentiation of HDFs directly to neurons led to expression of 193/516 (37.4%) of SMGs implicated in neurological disease. The magnitude and isoform diversity of SMG expression following either transactivation or transdifferentiation was comparable to clinically relevant tissues. We apply transdifferentiation and/or gene transactivation combined with short- and long-read RNA sequencing to investigate the impact that variants in USH2A, SCN1A, DMD, and PAK3 have on RNA using HDFs derived from affected individuals. Transactivation and transdifferentiation represent rapid, scalable functional genomic solutions to investigate variants impacting SMGs in the patient cell and genomic context.
Collapse
Affiliation(s)
- Emmylou C Nicolas-Martinez
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; School of Biomedicine, University of Adelaide, Adelaide, SA 5005, Australia
| | - Olivia Robinson
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; School of Biomedicine, University of Adelaide, Adelaide, SA 5005, Australia
| | - Christian Pflueger
- Harry Perkins Institute of Medical Research, Nedlands, WA 6009, Australia; Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, The University of Western Australia, Crawley, WA 6009, Australia; The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia
| | - Alison Gardner
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia
| | - Mark A Corbett
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia
| | - Tarin Ritchie
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia
| | - Thessa Kroes
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia
| | - Clare L van Eyk
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia
| | - Ingrid E Scheffer
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia; Murdoch Children's Research Institute, Parkville, VIC 3052, Australia; Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, VIC 3052, Australia; Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, VIC 3052, Australia
| | - Michael S Hildebrand
- Epilepsy Research Centre, Department of Medicine, The University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia; Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Parkville, VIC 3052, Australia; The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia
| | - Jean-Vianney Barnier
- Institut des Neurosciences Paris-Saclay, UMR 9197, CNRS, Université Paris-Saclay, Saclay, France
| | - Véronique Rousseau
- Institut des Neurosciences Paris-Saclay, UMR 9197, CNRS, Université Paris-Saclay, Saclay, France
| | - David Genevieve
- Montpellier University, Inserm U1183, Reference Center for Rare Diseases Developmental Anomaly and Malformative Syndromes, Genetics Department, Montpellier Hospital, Montpellier, France
| | - Virginie Haushalter
- Genetic Diagnosis Laboratory, Strasbourg University Hospital, Strasbourg, France
| | - Amélie Piton
- Genetic Diagnosis Laboratory, Strasbourg University Hospital, Strasbourg, France
| | - Anne-Sophie Denommé-Pichon
- CRMRs "Anomalies du Développement et syndromes malformatifs" et "Déficiences Intellectuelles de causes rares", Centre de Génétique, CHU Dijon, Dijon, France; INSERM UMR1231, GAD "Génétique des Anomalies du Développement," FHU-TRANSLAD, University of Burgundy, Dijon, France
| | - Ange-Line Bruel
- CRMRs "Anomalies du Développement et syndromes malformatifs" et "Déficiences Intellectuelles de causes rares", Centre de Génétique, CHU Dijon, Dijon, France; INSERM UMR1231, GAD "Génétique des Anomalies du Développement," FHU-TRANSLAD, University of Burgundy, Dijon, France
| | - Sophie Nambot
- CRMRs "Anomalies du Développement et syndromes malformatifs" et "Déficiences Intellectuelles de causes rares", Centre de Génétique, CHU Dijon, Dijon, France; INSERM UMR1231, GAD "Génétique des Anomalies du Développement," FHU-TRANSLAD, University of Burgundy, Dijon, France
| | - Bertrand Isidor
- CRMRs "Anomalies du Développement et syndromes malformatifs" et "Déficiences Intellectuelles de causes rares", Centre de Génétique, CHU Dijon, Dijon, France; INSERM UMR1231, GAD "Génétique des Anomalies du Développement," FHU-TRANSLAD, University of Burgundy, Dijon, France
| | - John Grigg
- Speciality of Ophthalmology, Save Sight Institute, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2000, Australia
| | - Tina Gonzalez
- Department of Clinical Genetics, Royal North Shore Hospital, St Leonards, NSW 2065, Australia
| | - Sondhya Ghedia
- Department of Clinical Genetics, Royal North Shore Hospital, St Leonards, NSW 2065, Australia
| | - Rhett G Marchant
- Kids Neuroscience Centre, Kids Research, Children's Hospital at Westmead, Westmead, NSW 2145, Australia; Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2000, Australia
| | - Adam Bournazos
- Kids Neuroscience Centre, Kids Research, Children's Hospital at Westmead, Westmead, NSW 2145, Australia; Children's Medical Research Institute, Westmead, NSW 2145, Australia
| | - Wui-Kwan Wong
- Kids Neuroscience Centre, Kids Research, Children's Hospital at Westmead, Westmead, NSW 2145, Australia; Children's Medical Research Institute, Westmead, NSW 2145, Australia; Department of Paediatric Neurology, Children's Hospital at Westmead, Sydney, NSW 2000, Australia
| | - Richard I Webster
- Department of Paediatric Neurology, Children's Hospital at Westmead, Sydney, NSW 2000, Australia
| | - Frances J Evesson
- Kids Neuroscience Centre, Kids Research, Children's Hospital at Westmead, Westmead, NSW 2145, Australia; Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2000, Australia; Children's Medical Research Institute, Westmead, NSW 2145, Australia
| | - Kristi J Jones
- Kids Neuroscience Centre, Kids Research, Children's Hospital at Westmead, Westmead, NSW 2145, Australia; Children's Medical Research Institute, Westmead, NSW 2145, Australia; Department of Clinical Genetics, Children's Hospital at Westmead, Sydney, NSW 2000, Australia
| | - Sandra T Cooper
- Kids Neuroscience Centre, Kids Research, Children's Hospital at Westmead, Westmead, NSW 2145, Australia; Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2000, Australia; Children's Medical Research Institute, Westmead, NSW 2145, Australia
| | - Ryan Lister
- Harry Perkins Institute of Medical Research, Nedlands, WA 6009, Australia; Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, The University of Western Australia, Crawley, WA 6009, Australia
| | - Jozef Gecz
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia.
| | - Lachlan A Jolly
- The Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia; School of Biomedicine, University of Adelaide, Adelaide, SA 5005, Australia.
| |
Collapse
|
2
|
Kowalski MH, Wessels HH, Linder J, Dalgarno C, Mascio I, Choudhary S, Hartman A, Hao Y, Kundaje A, Satija R. Multiplexed single-cell characterization of alternative polyadenylation regulators. Cell 2024; 187:4408-4425.e23. [PMID: 38925112 DOI: 10.1016/j.cell.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/12/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024]
Abstract
Most mammalian genes have multiple polyA sites, representing a substantial source of transcript diversity regulated by the cleavage and polyadenylation (CPA) machinery. To better understand how these proteins govern polyA site choice, we introduce CPA-Perturb-seq, a multiplexed perturbation screen dataset of 42 CPA regulators with a 3' scRNA-seq readout that enables transcriptome-wide inference of polyA site usage. We develop a framework to detect perturbation-dependent changes in polyadenylation and characterize modules of co-regulated polyA sites. We find groups of intronic polyA sites regulated by distinct components of the nuclear RNA life cycle, including elongation, splicing, termination, and surveillance. We train and validate a deep neural network (APARENT-Perturb) for tandem polyA site usage, delineating a cis-regulatory code that predicts perturbation response and reveals interactions between regulatory complexes. Our work highlights the potential for multiplexed single-cell perturbation screens to further our understanding of post-transcriptional regulation.
Collapse
Affiliation(s)
- Madeline H Kowalski
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York University Grossman School of Medicine, New York, NY, USA
| | - Hans-Hermann Wessels
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA.
| | - Johannes Linder
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Isabella Mascio
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Saket Choudhary
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Yuhan Hao
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Rahul Satija
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York University Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
3
|
Quinones-Valdez G, Amoah K, Xiao X. Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599101. [PMID: 38915585 PMCID: PMC11195283 DOI: 10.1101/2024.06.14.599101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Genetic regulation of alternative splicing constitutes an important link between genetic variation and disease. Nonetheless, RNA splicing is regulated by both cis-acting elements and trans-acting splicing factors. Determining splicing events that are directed primarily by the cis- or trans-acting mechanisms will greatly inform our understanding of the genetic basis of disease. Here, we show that long-read RNA-seq, combined with our new method isoLASER, enables a clear segregation of cis- and trans-directed splicing events for individual samples. The genetic linkage of splicing is largely individual-specific, in stark contrast to the tissue-specific pattern of splicing profiles. Analysis of long-read RNA-seq data from human and mouse revealed thousands of cis-directed splicing events susceptible to genetic regulation. We highlight such events in the HLA genes whose analysis was challenging with short-read data. We also highlight novel cis-directed splicing events in Alzheimer's disease-relevant genes such as MAPT and BIN1. Together, the clear demarcation of cis- and trans-directed splicing paves ways for future studies of the genetic basis of disease.
Collapse
Affiliation(s)
- Giovanni Quinones-Valdez
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kofi Amoah
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
4
|
Chen K, Zhou Y, Ding M, Wang Y, Ren Z, Yang Y. Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Brief Bioinform 2024; 25:bbae163. [PMID: 38605640 PMCID: PMC11009468 DOI: 10.1093/bib/bbae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/22/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.
Collapse
Affiliation(s)
- Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhou
- Peng Cheng Laboratory, Shenzhen, China
| | - Maolin Ding
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yu Wang
- Peng Cheng Laboratory, Shenzhen, China
| | | | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University), Ministry of Education, China
| |
Collapse
|
5
|
Tang Z, Koo PK. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582810. [PMID: 38464101 PMCID: PMC10925287 DOI: 10.1101/2024.02.29.582810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learn a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown pre-trained gLMs can be leveraged to improve prediction performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that current gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major limitation with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
Collapse
Affiliation(s)
- Ziqi Tang
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| | - Peter K Koo
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, NY, USA
| |
Collapse
|
6
|
Gupta K, Yang C, McCue K, Bastani O, Sharp PA, Burge CB, Solar-Lezama A. Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing. Genome Biol 2024; 25:23. [PMID: 38229106 PMCID: PMC10790492 DOI: 10.1186/s13059-023-03162-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/28/2023] [Indexed: 01/18/2024] Open
Abstract
Sequence-specific RNA-binding proteins (RBPs) play central roles in splicing decisions. Here, we describe a modular splicing architecture that leverages in vitro-derived RNA affinity models for 79 human RBPs and the annotated human genome to produce improved models of RBP binding and activity. Binding and activity are modeled by separate Motif and Aggregator components that can be mixed and matched, enforcing sparsity to improve interpretability. Training a new Adjusted Motif (AM) architecture on the splicing task not only yields better splicing predictions but also improves prediction of RBP-binding sites in vivo and of splicing activity, assessed using independent data.
Collapse
Affiliation(s)
- Kavi Gupta
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Chenxi Yang
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA
| | - Kayla McCue
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Osbert Bastani
- Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Phillip A Sharp
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
- Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Armando Solar-Lezama
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| |
Collapse
|
7
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
8
|
Scheller IF, Lutz K, Mertes C, Yépez VA, Gagneur J. Improved detection of aberrant splicing with FRASER 2.0 and the intron Jaccard index. Am J Hum Genet 2023; 110:2056-2067. [PMID: 38006880 PMCID: PMC10716352 DOI: 10.1016/j.ajhg.2023.10.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/20/2023] [Accepted: 10/26/2023] [Indexed: 11/27/2023] Open
Abstract
Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.
Collapse
Affiliation(s)
- Ines F Scheller
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany; Computational Health Center, Helmholtz Center Munich, 85764 Neuherberg, Germany
| | - Karoline Lutz
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany
| | - Christian Mertes
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany; Munich Data Science Institute, Technical University of Munich, 85748 Garching, Germany; Institute of Human Genetics, School of Medicine, Technical University of Munich, 81675 Munich, Germany
| | - Vicente A Yépez
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany.
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany; Computational Health Center, Helmholtz Center Munich, 85764 Neuherberg, Germany; Munich Data Science Institute, Technical University of Munich, 85748 Garching, Germany; Institute of Human Genetics, School of Medicine, Technical University of Munich, 81675 Munich, Germany.
| |
Collapse
|
9
|
Wang R, Helbig I, Edmondson AC, Lin L, Xing Y. Splicing defects in rare diseases: transcriptomics and machine learning strategies towards genetic diagnosis. Brief Bioinform 2023; 24:bbad284. [PMID: 37580177 PMCID: PMC10516351 DOI: 10.1093/bib/bbad284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 07/10/2023] [Accepted: 07/20/2023] [Indexed: 08/16/2023] Open
Abstract
Genomic variants affecting pre-messenger RNA splicing and its regulation are known to underlie many rare genetic diseases. However, common workflows for genetic diagnosis and clinical variant interpretation frequently overlook splice-altering variants. To better serve patient populations and advance biomedical knowledge, it has become increasingly important to develop and refine approaches for detecting and interpreting pathogenic splicing variants. In this review, we will summarize a few recent developments and challenges in using RNA sequencing technologies for rare disease investigation. Moreover, we will discuss how recent computational splicing prediction tools have emerged as complementary approaches for revealing disease-causing variants underlying splicing defects. We speculate that continuous improvements to sequencing technologies and predictive modeling will not only expand our understanding of splicing regulation but also bring us closer to filling the diagnostic gap for rare disease patients.
Collapse
Affiliation(s)
- Robert Wang
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ingo Helbig
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Andrew C Edmondson
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pediatrics, Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Lan Lin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yi Xing
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
10
|
Wagner N, Çelik MH, Hölzlwimmer FR, Mertes C, Prokisch H, Yépez VA, Gagneur J. Aberrant splicing prediction across human tissues. Nat Genet 2023; 55:861-870. [PMID: 37142848 DOI: 10.1038/s41588-023-01373-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 03/14/2023] [Indexed: 05/06/2023]
Abstract
Aberrant splicing is a major cause of genetic disorders but its direct detection in transcriptomes is limited to clinically accessible tissues such as skin or body fluids. While DNA-based machine learning models can prioritize rare variants for affecting splicing, their performance in predicting tissue-specific aberrant splicing remains unassessed. Here we generated an aberrant splicing benchmark dataset, spanning over 8.8 million rare variants in 49 human tissues from the Genotype-Tissue Expression (GTEx) dataset. At 20% recall, state-of-the-art DNA-based models achieve maximum 12% precision. By mapping and quantifying tissue-specific splice site usage transcriptome-wide and modeling isoform competition, we increased precision by threefold at the same recall. Integrating RNA-sequencing data of clinically accessible tissues into our model, AbSplice, brought precision to 60%. These results, replicated in two independent cohorts, substantially contribute to noncoding loss-of-function variant identification and to genetic diagnostics design and analytics.
Collapse
Affiliation(s)
- Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Muhammed H Çelik
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
| | - Florian R Hölzlwimmer
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Christian Mertes
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Vicente A Yépez
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
11
|
Dominguez-Alonso S, Carracedo A, Rodriguez-Fontenla C. The non-coding genome in Autism Spectrum Disorders. Eur J Med Genet 2023; 66:104752. [PMID: 37023975 DOI: 10.1016/j.ejmg.2023.104752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 03/10/2023] [Accepted: 03/19/2023] [Indexed: 04/08/2023]
Abstract
Autism Spectrum Disorders (ASD) are a group of neurodevelopmental disorders (NDDs) characterized by difficulties in social interaction and communication, repetitive behavior, and restricted interests. While ASD have been proven to have a strong genetic component, current research largely focuses on coding regions of the genome. However, non-coding DNA, which makes up for ∼99% of the human genome, has recently been recognized as an important contributor to the high heritability of ASD, and novel sequencing technologies have been a milestone in opening up new directions for the study of the gene regulatory networks embedded within the non-coding regions. Here, we summarize current progress on the contribution of non-coding alterations to the pathogenesis of ASD and provide an overview of existing methods allowing for the study of their functional relevance, discussing potential ways of unraveling ASD's "missing heritability".
Collapse
Affiliation(s)
- S Dominguez-Alonso
- Grupo de Medicina Xenómica, Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), Universidad de Santiago de Compostela, Santiago de Compostela, Spain
| | - A Carracedo
- Grupo de Medicina Xenómica, Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), Universidad de Santiago de Compostela, Santiago de Compostela, Spain; Grupo de Medicina Xenómica, Fundación Instituto de Investigación Sanitaria de Santiago de Compostela (FIDIS), Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), Universidad de Santiago de Compostela, Santiago de Compostela, Spain
| | - C Rodriguez-Fontenla
- Grupo de Medicina Xenómica, Center for Research in Molecular Medicine and Chronic Diseases (CiMUS), Universidad de Santiago de Compostela, Santiago de Compostela, Spain.
| |
Collapse
|
12
|
Scheller IF, Lutz K, Mertes C, Yépez VA, Gagneur J. Improved detection of aberrant splicing using the Intron Jaccard Index. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.31.23287997. [PMID: 37066374 PMCID: PMC10104204 DOI: 10.1101/2023.03.31.23287997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Detection of aberrantly spliced genes is an important step in RNA-seq-based rare disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method for aberrant splicing detection that outperformed alternative approaches. However, as FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron excision metric, the Intron Jaccard Index, that combines alternative donor, alternative acceptor, and intron retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs using candidate rare splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare splice-disrupting variants by 10 fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. Application on 303 rare disease samples confirmed the reduction fold-change of the number of outlier calls for a slight loss of sensitivity (only 2 out of 22 previously identified pathogenic splicing cases not recovered). Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by a drastic reduction of the amount of splicing outlier calls per sample at minimal loss of sensitivity.
Collapse
Affiliation(s)
- Ines F. Scheller
- School of Computation, Information and Technology, Technical University of Munich, Garching, 85748, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764, Germany
| | - Karoline Lutz
- School of Computation, Information and Technology, Technical University of Munich, Garching, 85748, Germany
| | - Christian Mertes
- School of Computation, Information and Technology, Technical University of Munich, Garching, 85748, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, 85748, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, 81675, Germany
| | - Vicente A. Yépez
- School of Computation, Information and Technology, Technical University of Munich, Garching, 85748, Germany
| | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, 85748, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, 85748, Germany
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, 81675, Germany
| |
Collapse
|
13
|
Kowalski MH, Wessels HH, Linder J, Choudhary S, Hartman A, Hao Y, Mascio I, Dalgarno C, Kundaje A, Satija R. CPA-Perturb-seq: Multiplexed single-cell characterization of alternative polyadenylation regulators. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527751. [PMID: 36798324 PMCID: PMC9934614 DOI: 10.1101/2023.02.09.527751] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Most mammalian genes have multiple polyA sites, representing a substantial source of transcript diversity that is governed by the cleavage and polyadenylation (CPA) regulatory machinery. To better understand how these proteins govern polyA site choice we introduce CPA-Perturb-seq, a multiplexed perturbation screen dataset of 42 known CPA regulators with a 3' scRNA-seq readout that enables transcriptome-wide inference of polyA site usage. We develop a statistical framework to specifically identify perturbation-dependent changes in intronic and tandem polyadenylation, and discover modules of co-regulated polyA sites exhibiting distinct functional properties. By training a multi-task deep neural network (APARENT-Perturb) on our dataset, we delineate a cis-regulatory code that predicts responsiveness to perturbation and reveals interactions between distinct regulatory complexes. Finally, we leverage our framework to re-analyze published scRNA-seq datasets, identifying new regulators that affect the relative abundance of alternatively polyadenylated transcripts, and characterizing extensive cellular heterogeneity in 3' UTR length amongst antibody-producing cells. Our work highlights the potential for multiplexed single-cell perturbation screens to further our understanding of post-transcriptional regulation in vitro and in vivo.
Collapse
Affiliation(s)
- Madeline H. Kowalski
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York University Grossman School of Medicine, New York, NY, USA
| | - Hans-Hermann Wessels
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Johannes Linder
- Department of Genetics, Stanford University, Stanford USA
- Department of Computer Science, Stanford University, Stanford USA
| | - Saket Choudhary
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Yuhan Hao
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Isabella Mascio
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford USA
- Department of Computer Science, Stanford University, Stanford USA
| | - Rahul Satija
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
- New York University Grossman School of Medicine, New York, NY, USA
| |
Collapse
|
14
|
Linder J, Koplik SE, Kundaje A, Seelig G. Deciphering the impact of genetic variation on human polyadenylation using APARENT2. Genome Biol 2022; 23:232. [PMID: 36335397 PMCID: PMC9636789 DOI: 10.1186/s13059-022-02799-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 10/19/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging. RESULTS We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells. CONCLUSIONS A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.
Collapse
Affiliation(s)
| | | | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, USA
- Department of Computer Science, Stanford University, Stanford, USA
| | - Georg Seelig
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
- Department of Electrical and Computer Engineering, University of Washington, Seattle, USA
| |
Collapse
|
15
|
Malina S, Cizin D, Knowles DA. Deep mendelian randomization: Investigating the causal knowledge of genomic deep learning models. PLoS Comput Biol 2022; 18:e1009880. [PMID: 36265006 PMCID: PMC9624391 DOI: 10.1371/journal.pcbi.1009880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 11/01/2022] [Accepted: 09/19/2022] [Indexed: 11/06/2022] Open
Abstract
Multi-task deep learning (DL) models can accurately predict diverse genomic marks from sequence, but whether these models learn the causal relationships between genomic marks is unknown. Here, we describe Deep Mendelian Randomization (DeepMR), a method for estimating causal relationships between genomic marks learned by genomic DL models. By combining Mendelian randomization with in silico mutagenesis, DeepMR obtains local (locus specific) and global estimates of (an assumed) linear causal relationship between marks. In a simulation designed to test recovery of pairwise causal relations between transcription factors (TFs), DeepMR gives accurate and unbiased estimates of the 'true' global causal effect, but its coverage decays in the presence of sequence-dependent confounding. We then apply DeepMR to examine the global relationships learned by a state-of-the-art DL model, BPNet, between TFs involved in reprogramming. DeepMR's causal effect estimates validate previously hypothesized relationships between TFs and suggest new relationships for future investigation.
Collapse
Affiliation(s)
- Stephen Malina
- Department of Computer Science, Columbia University, New York, New York, United States of America
- Dyno Therapeutics, Watertown, Massachusetts, United States of America
- * E-mail: ,
| | - Daniel Cizin
- Department of Computer Science, Columbia University, New York, New York, United States of America
- Tri-Institutional Ph.D. Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, New York, United States of America
| | - David A. Knowles
- Department of Computer Science, Columbia University, New York, New York, United States of America
- New York Genome Center, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- Data Science Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
16
|
Ling JP, Bygrave AM, Santiago CP, Carmen-Orozco RP, Trinh VT, Yu M, Li Y, Liu Y, Bowden KD, Duncan LH, Han J, Taneja K, Dongmo R, Babola TA, Parker P, Jiang L, Leavey PJ, Smith JJ, Vistein R, Gimmen MY, Dubner B, Helmenstine E, Teodorescu P, Karantanos T, Ghiaur G, Kanold PO, Bergles D, Langmead B, Sun S, Nielsen KJ, Peachey N, Singh MS, Dalton WB, Rajaii F, Huganir RL, Blackshaw S. Cell-specific regulation of gene expression using splicing-dependent frameshifting. Nat Commun 2022; 13:5773. [PMID: 36182931 PMCID: PMC9526712 DOI: 10.1038/s41467-022-33523-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 09/21/2022] [Indexed: 01/29/2023] Open
Abstract
Precise and reliable cell-specific gene delivery remains technically challenging. Here we report a splicing-based approach for controlling gene expression whereby separate translational reading frames are coupled to the inclusion or exclusion of mutated, frameshifting cell-specific alternative exons. Candidate exons are identified by analyzing thousands of publicly available RNA sequencing datasets and filtering by cell specificity, conservation, and local intron length. This method, which we denote splicing-linked expression design (SLED), can be combined in a Boolean manner with existing techniques such as minipromoters and viral capsids. SLED can use strong constitutive promoters, without sacrificing precision, by decoupling the tradeoff between promoter strength and selectivity. AAV-packaged SLED vectors can selectively deliver fluorescent reporters and calcium indicators to various neuronal subtypes in vivo. We also demonstrate gene therapy utility by creating SLED vectors that can target PRPH2 and SF3B1 mutations. The flexibility of SLED technology enables creative avenues for basic and translational research.
Collapse
Affiliation(s)
- Jonathan P Ling
- Departments of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, 21218, USA.
| | - Alexei M Bygrave
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Clayton P Santiago
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Rogger P Carmen-Orozco
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Vickie T Trinh
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Minzhong Yu
- Department of Ophthalmic Research, Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, USA
- Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, 44195, USA
| | - Yini Li
- Departments of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Ying Liu
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Kyra D Bowden
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Leighton H Duncan
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Jeong Han
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Kamil Taneja
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Rochinelle Dongmo
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Travis A Babola
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Patrick Parker
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Lizhi Jiang
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Patrick J Leavey
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Jennifer J Smith
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Rachel Vistein
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Megan Y Gimmen
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Benjamin Dubner
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Eric Helmenstine
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Patric Teodorescu
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Theodoros Karantanos
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Gabriel Ghiaur
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Patrick O Kanold
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Dwight Bergles
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Ben Langmead
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Shuying Sun
- Departments of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Kristina J Nielsen
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Neal Peachey
- Department of Ophthalmic Research, Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, USA
- Department of Ophthalmology, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, 44195, USA
- Research Service, Louis Stokes Cleveland VA Medical Center, Cleveland, OH, 44106, USA
| | - Mandeep S Singh
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - W Brian Dalton
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Fatemeh Rajaii
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Richard L Huganir
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Seth Blackshaw
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
- Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
- Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
- Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
| |
Collapse
|
17
|
Ellingford JM, Ahn JW, Bagnall RD, Baralle D, Barton S, Campbell C, Downes K, Ellard S, Duff-Farrier C, FitzPatrick DR, Greally JM, Ingles J, Krishnan N, Lord J, Martin HC, Newman WG, O’Donnell-Luria A, Ramsden SC, Rehm HL, Richardson E, Singer-Berk M, Taylor JC, Williams M, Wood JC, Wright CF, Harrison SM, Whiffin N. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med 2022; 14:73. [PMID: 35850704 PMCID: PMC9295495 DOI: 10.1186/s13073-022-01073-3] [Citation(s) in RCA: 66] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 06/16/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. METHODS We convened a panel of nine clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. RESULTS We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for the interpretation of these variants. CONCLUSIONS These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms.
Collapse
Affiliation(s)
- Jamie M. Ellingford
- grid.5379.80000000121662407Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT UK ,grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK ,grid.498322.6Genomics England, London, UK
| | - Joo Wook Ahn
- grid.24029.3d0000 0004 0383 8386Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Richard D. Bagnall
- grid.1013.30000 0004 1936 834XAgnes Ginges Centre for Molecular Cardiology at Centenary Institute, University of Sydney, Sydney, Australia
| | - Diana Baralle
- grid.5491.90000 0004 1936 9297School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK ,grid.430506.40000 0004 0465 4079Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Stephanie Barton
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Chris Campbell
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Kate Downes
- grid.24029.3d0000 0004 0383 8386Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Sian Ellard
- grid.8391.30000 0004 1936 8024Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK ,grid.419309.60000 0004 0495 6261South West Genomic Laboratory Hub, Exeter Genomic Laboratory, Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
| | - Celia Duff-Farrier
- grid.418484.50000 0004 0380 7221South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - David R. FitzPatrick
- grid.417068.c0000 0004 0624 9907MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - John M. Greally
- grid.251993.50000000121791997Department of Pediatrics, Division of Pediatric Genetic, Medicine, Children’s Hospital at Montefiore/Montefiore Medical Center/Albert, Einstein College of Medicine, Bronx, NY USA
| | - Jodie Ingles
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Neesha Krishnan
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Jenny Lord
- grid.5491.90000 0004 1936 9297School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Hilary C. Martin
- grid.10306.340000 0004 0606 5382Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - William G. Newman
- grid.5379.80000000121662407Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT UK ,grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Anne O’Donnell-Luria
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.2515.30000 0004 0378 8438Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA USA ,grid.32224.350000 0004 0386 9924Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Simon C. Ramsden
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Heidi L. Rehm
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.32224.350000 0004 0386 9924Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Ebony Richardson
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Moriel Singer-Berk
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Jenny C. Taylor
- grid.4991.50000 0004 1936 8948National Institute for Health Research Oxford Biomedical Research Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK ,grid.4991.50000 0004 1936 8948Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| | - Maggie Williams
- grid.418484.50000 0004 0380 7221South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Jordan C. Wood
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Caroline F. Wright
- grid.8391.30000 0004 1936 8024Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Steven M. Harrison
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.465138.d0000 0004 0455 211XAmbry Genetics, Aliso Viejo, CA USA
| | - Nicola Whiffin
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.4991.50000 0004 1936 8948Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| |
Collapse
|
18
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
19
|
Zeng T, Li YI. Predicting RNA splicing from DNA sequence using Pangolin. Genome Biol 2022; 23:103. [PMID: 35449021 PMCID: PMC9022248 DOI: 10.1186/s13059-022-02664-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 04/04/2022] [Indexed: 11/26/2022] Open
Abstract
Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.
Collapse
Affiliation(s)
- Tony Zeng
- The College, University of Chicago, Chicago, 60637, IL, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, 60637, IL, USA.
| |
Collapse
|
20
|
Lord J, Baralle D. Splicing in the Diagnosis of Rare Disease: Advances and Challenges. Front Genet 2021; 12:689892. [PMID: 34276790 PMCID: PMC8280750 DOI: 10.3389/fgene.2021.689892] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/07/2021] [Indexed: 12/13/2022] Open
Abstract
Mutations which affect splicing are significant contributors to rare disease, but are frequently overlooked by diagnostic sequencing pipelines. Greater ascertainment of pathogenic splicing variants will increase diagnostic yields, ending the diagnostic odyssey for patients and families affected by rare disorders, and improving treatment and care strategies. Advances in sequencing technologies, predictive modeling, and understanding of the mechanisms of splicing in recent years pave the way for improved detection and interpretation of splice affecting variants, yet several limitations still prohibit their routine ascertainment in diagnostic testing. This review explores some of these advances in the context of clinical application and discusses challenges to be overcome before these variants are comprehensively and routinely recognized in diagnostics.
Collapse
Affiliation(s)
- Jenny Lord
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
- Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, United Kingdom
| |
Collapse
|
21
|
Publisher Correction: MTSplice predicts effects of genetic variants on tissue-specific splicing. Genome Biol 2021; 22:107. [PMID: 33858505 PMCID: PMC8050885 DOI: 10.1186/s13059-021-02338-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|