1
|
Sibbesen JA, Eizenga JM, Novak AM, Sirén J, Chang X, Garrison E, Paten B. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. Nat Methods 2023; 20:239-247. [PMID: 36646895 DOI: 10.1101/2021.03.26.437240] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/28/2022] [Indexed: 05/24/2023]
Abstract
Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our toolchain, which consists of additions to the VG toolkit and a standalone tool, RPVG, can construct spliced pangenome graphs, map RNA sequencing data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. We show that this workflow improves accuracy over state-of-the-art RNA sequencing mapping methods, and that it can efficiently quantify haplotype-specific transcript expression without needing to characterize the haplotypes of a sample beforehand.
Collapse
Affiliation(s)
| | | | - Adam M Novak
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Xian Chang
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Erik Garrison
- University of Tennessee Health Science Center, Memphis, TN, USA
| | | |
Collapse
|
2
|
Keegan NP, Fletcher S. A spotter's guide to SNPtic exons: The common splice variants underlying some SNP-phenotype correlations. Mol Genet Genomic Med 2021; 10:e1840. [PMID: 34708937 PMCID: PMC8801146 DOI: 10.1002/mgg3.1840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/12/2021] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Cryptic exons are typically characterised as deleterious splicing aberrations caused by deep intronic mutations. However, low-level splicing of cryptic exons is sometimes observed in the absence of any pathogenic mutation. Five recent reports have described how low-level splicing of cryptic exons can be modulated by common single-nucleotide polymorphisms (SNPs), resulting in phenotypic differences amongst different genotypes. METHODS We sought to investigate whether additional 'SNPtic' exons may exist, and whether these could provide an explanatory mechanism for some of the genotype-phenotype correlations revealed by genome-wide association studies. We thoroughly searched the literature for reported cryptic exons, cross-referenced their genomic coordinates against the dbSNP database of common SNPs, then screened out SNPs with no reported phenotype associations. RESULTS This method discovered five probable SNPtic exons in the genes APC, FGB, GHRL, MYPBC3 and OTC. For four of these five exons, we observed that the phenotype associated with the SNP was compatible with the predicted splicing effect of the nucleotide change, whilst the fifth (in GHRL) likely had a more complex splice-switching effect. CONCLUSION Application of our search methods could augment the knowledge value of future cryptic exon reports and aid in generating better hypotheses for genome-wide association studies.
Collapse
Affiliation(s)
- Niall Patrick Keegan
- Murdoch University, Murdoch, Western Australia, Australia.,Centre for Molecular Medicine and Innovative Therapeutics, Perth, Western Australia, Australia.,Perron Institute, Perth, Western Australia, Australia
| | - Sue Fletcher
- Murdoch University, Murdoch, Western Australia, Australia.,Centre for Molecular Medicine and Innovative Therapeutics, Perth, Western Australia, Australia.,University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
3
|
Sakaguchi N, Suyama M. In silico identification of pseudo-exon activation events in personal genome and transcriptome data. RNA Biol 2021; 18:382-390. [PMID: 32865117 PMCID: PMC7951959 DOI: 10.1080/15476286.2020.1809195] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 08/03/2020] [Accepted: 08/08/2020] [Indexed: 12/25/2022] Open
Abstract
Causative mutations for human genetic disorders have mainly been identified in exonic regions that code for amino acid sequences. Recently, however, it has been reported that mutations in deep intronic regions can also cause certain human genetic disorders by creating novel splice sites, leading to pseudo-exon activation. To investigate how frequently pseudo-exon activation events occur in normal individuals, we conducted in silico identification of such events using personal genome data and corresponding high-quality transcriptome data. With rather stringent conditions, on average, 2.6 pseudo-exon activation events per individual were identified. More pseudo-exon activation events were found in 5' donor splice sites than in 3' acceptor splice sites. Although pseudo-exon activation events have sporadically been reported as causative mutations in genetic disorders, it is revealed in this study that such events can be observed in normal individuals at a certain frequency. We estimate that human genomes typically contain on average at least 10 pseudo-exon activation events. The actual number should be higher than this, because we used stringent criteria to identify pseudo-exon activation events. This suggests that it is worth considering the possibility of pseudo-exon activation when searching for causative mutations of genetic disorders if candidate mutations are not identified in coding regions or RNA splice sites.
Collapse
Affiliation(s)
- Narumi Sakaguchi
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Mikita Suyama
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| |
Collapse
|
4
|
Demirdjian L, Xu Y, Bahrami-Samani E, Pan Y, Stein S, Xie Z, Park E, Wu YN, Xing Y. Detecting Allele-Specific Alternative Splicing from Population-Scale RNA-Seq Data. Am J Hum Genet 2020; 107:461-472. [PMID: 32781045 PMCID: PMC7477012 DOI: 10.1016/j.ajhg.2020.07.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 07/10/2020] [Indexed: 12/20/2022] Open
Abstract
RNA sequencing (RNA-seq) is a powerful technology for studying human transcriptome variation. We introduce PAIRADISE (Paired Replicate Analysis of Allelic Differential Splicing Events), a method for detecting allele-specific alternative splicing (ASAS) from RNA-seq data. Unlike conventional approaches that detect ASAS events one sample at a time, PAIRADISE aggregates ASAS signals across multiple individuals in a population. By treating the two alleles of an individual as paired, and multiple individuals sharing a heterozygous SNP as replicates, we formulate ASAS detection using PAIRADISE as a statistical problem for identifying differential alternative splicing from RNA-seq data with paired replicates. PAIRADISE outperforms alternative statistical models in simulation studies. Applying PAIRADISE to replicate RNA-seq data of a single individual and to population-scale RNA-seq data across many individuals, we detect ASAS events associated with genome-wide association study (GWAS) signals of complex traits or diseases. Additionally, PAIRADISE ASAS analysis detects the effects of rare variants on alternative splicing. PAIRADISE provides a useful computational tool for elucidating the genetic variation and phenotypic association of alternative splicing in populations.
Collapse
|
5
|
Ruiz-Reche A, Srivastava A, Indi JA, de la Rubia I, Eyras E. ReorientExpress: reference-free orientation of nanopore cDNA reads with deep learning. Genome Biol 2019; 20:260. [PMID: 31783882 PMCID: PMC6883653 DOI: 10.1186/s13059-019-1884-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 11/07/2019] [Indexed: 12/18/2022] Open
Abstract
We describe ReorientExpress, a method to perform reference-free orientation of transcriptomic long sequencing reads. ReorientExpress uses deep learning to correctly predict the orientation of the majority of reads, and in particular when trained on a closely related species or in combination with read clustering. ReorientExpress enables long-read transcriptomics in non-model organisms and samples without a genome reference without using additional technologies and is available at https://github.com/comprna/reorientexpress.
Collapse
Affiliation(s)
| | - Akanksha Srivastava
- The John Curtin School of Medical, Australian National University, Acton ACT, Canberra, 2601, Australia
- EMBL Australia Partner Laboratory Network and the Australian National University, Acton ACT, Canberra, 2601, Australia
| | - Joel A Indi
- Pompeu Fabra University, E08003, Barcelona, Spain
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | | | - Eduardo Eyras
- The John Curtin School of Medical, Australian National University, Acton ACT, Canberra, 2601, Australia.
- EMBL Australia Partner Laboratory Network and the Australian National University, Acton ACT, Canberra, 2601, Australia.
- IMIM - Hospital del Mar Medical Research Institute, E08003, Barcelona, Spain.
| |
Collapse
|
6
|
Liu X, MacLeod JN, Liu J. iMapSplice: Alleviating reference bias through personalized RNA-seq alignment. PLoS One 2018; 13:e0201554. [PMID: 30096157 PMCID: PMC6086400 DOI: 10.1371/journal.pone.0201554] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 07/17/2018] [Indexed: 11/19/2022] Open
Abstract
Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The algorithm makes use of personal genomic information and performs an unbiased alignment towards genome indices carrying both reference and alternative bases. Importantly, this breaks the dependency on reference genome splice site dinucleotide motifs and enables iMapSplice to discover personal splice junctions created through splice site polymorphisms. We report comparative analyses using a number of simulated and real datasets. Besides general improvements in read alignment and splice junction discovery, iMapSplice greatly alleviates allelic ratio biases and unravels many previously uncharacterized splice junctions created by splice site polymorphisms, with minimal overhead in computation time and storage. Software download URL: https://github.com/LiuBioinfo/iMapSplice.
Collapse
Affiliation(s)
- Xinan Liu
- Department of Computer Science, University of Kentucky, Lexington, KY, United States of America
| | - James N. MacLeod
- Department of Veterinary Science, University of Kentucky, Lexington, KY, United States of America
| | - Jinze Liu
- Department of Computer Science, University of Kentucky, Lexington, KY, United States of America
- * E-mail:
| |
Collapse
|
7
|
Park E, Pan Z, Zhang Z, Lin L, Xing Y. The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am J Hum Genet 2018; 102:11-26. [PMID: 29304370 PMCID: PMC5777382 DOI: 10.1016/j.ajhg.2017.11.002] [Citation(s) in RCA: 225] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 11/03/2017] [Indexed: 12/16/2022] Open
Abstract
Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine.
Collapse
Affiliation(s)
- Eddie Park
- Department of Microbiology, Immunology, & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zhicheng Pan
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zijun Zhang
- Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Lan Lin
- Department of Microbiology, Immunology, & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Yi Xing
- Department of Microbiology, Immunology, & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
8
|
Li YI, Knowles DA, Humphrey J, Barbeira AN, Dickinson SP, Im HK, Pritchard JK. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet 2018; 50:151-158. [PMID: 29229983 PMCID: PMC5742080 DOI: 10.1038/s41588-017-0004-9] [Citation(s) in RCA: 378] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 11/08/2017] [Indexed: 01/15/2023]
Abstract
The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4-2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.
Collapse
Affiliation(s)
- Yang I Li
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA.
| | - David A Knowles
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Computer Science, Stanford University, Stanford, CA, USA.
- Department of Radiology, Stanford University, Stanford, CA, USA.
| | - Jack Humphrey
- UCL Genetics Institute, Gower Street, London, UK
- Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK
| | - Alvaro N Barbeira
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Scott P Dickinson
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, The University of Chicago, Chicago, IL, USA
| | - Jonathan K Pritchard
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Biology, Stanford University, Stanford, CA, USA.
- Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA.
| |
Collapse
|
9
|
Cheng SJ, Shi FY, Liu H, Ding Y, Jiang S, Liang N, Gao G. Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic Acids Res 2017; 45:e82. [PMID: 28158838 PMCID: PMC5449550 DOI: 10.1093/nar/gkx041] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 01/24/2017] [Indexed: 02/07/2023] Open
Abstract
In genomics, effectively identifying the biological effects of genetic variants is crucial. Current methods handle each variant independently, assuming that each variant acts in a context-free manner. However, variants within the same gene may interfere with each other, producing combinational (compound) rather than individual effects. In this work, we introduce COPE, a gene-centric variant annotation tool that integrates the entire sequential context in evaluating the functional effects of intra-genic variants. Applying COPE to the 1000 Genomes dataset, we identified numerous cases of multiple-variant compound effects that frequently led to false-positive and false-negative loss-of-function calls by conventional variant-centric tools. Specifically, 64 disease-causing mutations were identified to be rescued in a specific genomic context, thus potentially contributing to the buffering effects for highly penetrant deleterious mutations. COPE is freely available for academic use at http://cope.cbi.pku.edu.cn.
Collapse
Affiliation(s)
- Si-Jin Cheng
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Huan Liu
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Yang Ding
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Shuai Jiang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Center for Bioinformatics, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
10
|
Viloria K, Hill NJ. Embracing the complexity of matricellular proteins: the functional and clinical significance of splice variation. Biomol Concepts 2017; 7:117-32. [PMID: 27135623 DOI: 10.1515/bmc-2016-0004] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 03/24/2016] [Indexed: 01/02/2023] Open
Abstract
Matricellular proteins influence wide-ranging fundamental cellular processes including cell adhesion, migration, growth and differentiation. They achieve this both through interactions with cell surface receptors and regulation of the matrix environment. Many matricellular proteins are also associated with diverse clinical disorders including cancer and diabetes. Alternative splicing is a precisely regulated process that can produce multiple isoforms with variable functions from a single gene. To date, the expression of alternate transcripts for the matricellular family has been reported for only a handful of genes. Here we analyse the evidence for alternative splicing across the matricellular family including the secreted protein acidic and rich in cysteine (SPARC), thrombospondin, tenascin and CCN families. We find that matricellular proteins have double the average number of splice variants per gene, and discuss the types of domain affected by splicing in matricellular proteins. We also review the clinical significance of alternative splicing for three specific matricellular proteins that have been relatively well characterised: osteopontin (OPN), tenascin-C (TNC) and periostin. Embracing the complexity of matricellular splice variants will be important for understanding the sometimes contradictory function of these powerful regulatory proteins, and for their effective clinical application as biomarkers and therapeutic targets.
Collapse
|
11
|
|
12
|
Stein S, Bahrami-Samani E, Xing Y. Using RNA-Seq to Discover Genetic Polymorphisms That Produce Hidden Splice Variants. Methods Mol Biol 2017; 1648:129-142. [PMID: 28766294 DOI: 10.1007/978-1-4939-7204-3_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
RNA-seq is a powerful and popular technology for studying posttranscriptional regulation of gene expression, such as alternative splicing. The first step in analyzing RNA-seq data is to map the sequenced reads back to the genome. However, commonly used RNA-seq aligners use the consensus splice site dinucleotide motifs to map reads across splice junctions. This can be deceiving due to genomic variants that create novel splice site dinucleotides, leaving the personal splice junction reads un-mapped to the reference genome. We developed and evaluated a method called RNA Personal Genome Alignment Analyzer (rPGA) to identify "hidden" splicing variations in personal transcriptomes, by mapping personal RNA-seq data to personal genomes. Our work demonstrates that the personal genome approach to RNA-seq read alignment enables the discovery of a large but previously unknown catalog of splicing variations in human populations.
Collapse
Affiliation(s)
- Shayna Stein
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Emad Bahrami-Samani
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA
| | - Yi Xing
- Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
| |
Collapse
|