1
|
Yuan CU, Quah FX, Hemberg M. Single-cell and spatial transcriptomics: Bridging current technologies with long-read sequencing. Mol Aspects Med 2024; 96:101255. [PMID: 38368637 DOI: 10.1016/j.mam.2024.101255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/30/2024] [Accepted: 02/07/2024] [Indexed: 02/20/2024]
Abstract
Single-cell technologies have transformed biomedical research over the last decade, opening up new possibilities for understanding cellular heterogeneity, both at the genomic and transcriptomic level. In addition, more recent developments of spatial transcriptomics technologies have made it possible to profile cells in their tissue context. In parallel, there have been substantial advances in sequencing technologies, and the third generation of methods are able to produce reads that are tens of kilobases long, with error rates matching the second generation short reads. Long reads technologies make it possible to better map large genome rearrangements and quantify isoform specific abundances. This further improves our ability to characterize functionally relevant heterogeneity. Here, we show how researchers have begun to combine single-cell, spatial transcriptomics, and long-read technologies, and how this is resulting in powerful new approaches to profiling both the genome and the transcriptome. We discuss the achievements so far, and we highlight remaining challenges and opportunities.
Collapse
Affiliation(s)
- Chengwei Ulrika Yuan
- Department of Biochemistry, University of Cambridge, Cambridge, UK; Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
| | - Fu Xiang Quah
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Martin Hemberg
- Gene Lay Institute, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Adams M, Vollmers C. Generation and analysis of a mouse multi-tissue genome annotation atlas. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.31.578267. [PMID: 38352519 PMCID: PMC10862843 DOI: 10.1101/2024.01.31.578267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Generating an accurate and complete genome annotation for an organism is complex because the cells within each tissue can express a unique set of transcript isoforms from a unique set of genes. A comprehensive genome annotation should contain information on what tissues express what transcript isoforms at what level. This tissue-level isoform information can then inform a wide range of research questions as well as experiment designs. Long-read sequencing technology combined with advanced full-length cDNA library preparation methods has now achieved throughput and accuracy where generating these types of annotations is achievable. Here, we show this by generating a genome annotation of the mouse (Mus musculus). We used the nanopore-based R2C2 long-read sequencing method to generate 64 million highly accurate full length cDNA consensus reads - averaging 5.4 million reads per tissue for a dozen tissues. Using the Mandalorion tool we processed these reads to generate the Tissue-level Atlas of Mouse Isoforms (TAMI - available at https://genome.ucsc.edu/s/vollmers/TAMI) which we believe will be a valuable complement to conventional, manually curated reference genome annotations.
Collapse
Affiliation(s)
- Matthew Adams
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz
| | | |
Collapse
|
3
|
Conte MI, Fuentes-Trillo A, Domínguez Conde C. Opportunities and tradeoffs in single-cell transcriptomic technologies. Trends Genet 2024; 40:83-93. [PMID: 37953195 DOI: 10.1016/j.tig.2023.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/26/2023] [Accepted: 10/03/2023] [Indexed: 11/14/2023]
Abstract
Recent technological and algorithmic advances enable single-cell transcriptomic analysis with remarkable depth and breadth. Nonetheless, a persistent challenge is the compromise between the ability to profile high numbers of cells and the achievement of full-length transcript coverage. Currently, the field is progressing and developing new and creative solutions that improve cellular throughput, gene detection sensitivity and full-length transcript capture. Furthermore, long-read sequencing approaches for single-cell transcripts are breaking frontiers that have previously blocked full transcriptome characterization. We here present a comprehensive overview of available options for single-cell transcriptome profiling, highlighting the key advantages and disadvantages of each approach.
Collapse
Affiliation(s)
- Matilde I Conte
- Human Technopole, Viale Rita Levi-Montalcini 1, 20157 Milan, Italy
| | | | | |
Collapse
|
4
|
Maina S, Norton SL, Rodoni BC. Hybrid RNA sequencing of broad bean wilt virus 2 from faba beans. Microbiol Spectr 2023; 11:e0266323. [PMID: 37823658 PMCID: PMC10714761 DOI: 10.1128/spectrum.02663-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/01/2023] [Indexed: 10/13/2023] Open
Abstract
IMPORTANCE Globally, viral diseases impair the growth and vigor of cultivated crops such as grains, leading to a significant reduction in quality, marketability, and competitiveness. As an island nation, Australia has a distinct advantage in using its border to prevent the introduction of damaging viruses, which threaten the continental agricultural sector. However, breeding programs in Australia rely on imported seeds as new sources of genetic diversity. As such, it is critical to remain vigilant in identifying new and emerging viral pathogens, by ensuring the availability of accurate genomic diagnostic tools at the grain biosecurity border. High-throughput sequencing offers game-changing opportunities in biosecurity routine testing. Genomic results are more accurate and informative compared to traditional molecular methods or biological indexing. The present work contributes to strengthening accurate phytosanitary screening, to safeguard the Australian grains industry, and expedite germplasm release to the end users.
Collapse
Affiliation(s)
- Solomon Maina
- NSW Department of Primary Industries, Biosecurity & Food Safety, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, Menangle, NSW, Australia
- Australian Grains Genebank, Agriculture Victoria, Horsham, Victoria, Australia
| | - Sally L. Norton
- Australian Grains Genebank, Agriculture Victoria, Horsham, Victoria, Australia
| | - Brendan C. Rodoni
- Microbial Sciences, Pests & Diseases, Agriculture Victoria, AgriBio, Ring Road, Bundoora, Victoria, Australia
- School of Applied Systems Biology (SASB), La Trobe University, Bundoora, Victoria, Australia
| |
Collapse
|
5
|
Zhang C, Fang Y, Chen W, Chen Z, Zhang Y, Xie Y, Chen W, Xie Z, Guo M, Wang J, Tan C, Wang H, Tang C. Improving the RNA velocity approach with single-cell RNA lifecycle (nascent, mature and degrading RNAs) sequencing technologies. Nucleic Acids Res 2023; 51:e112. [PMID: 37941145 PMCID: PMC10711548 DOI: 10.1093/nar/gkad969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 09/27/2023] [Accepted: 10/14/2023] [Indexed: 11/10/2023] Open
Abstract
We presented an experimental method called FLOUR-seq, which combines BD Rhapsody and nanopore sequencing to detect the RNA lifecycle (including nascent, mature, and degrading RNAs) in cells. Additionally, we updated our HIT-scISOseq V2 to discover a more accurate RNA lifecycle using 10x Chromium and Pacbio sequencing. Most importantly, to explore how single-cell full-length RNA sequencing technologies could help improve the RNA velocity approach, we introduced a new algorithm called 'Region Velocity' to more accurately configure cellular RNA velocity. We applied this algorithm to study spermiogenesis and compared the performance of FLOUR-seq with Pacbio-based HIT-scISOseq V2. Our findings demonstrated that 'Region Velocity' is more suitable for analyzing single-cell full-length RNA data than traditional RNA velocity approaches. These novel methods could be useful for researchers looking to discover full-length RNAs in single cells and comprehensively monitor RNA lifecycle in cells.
Collapse
Affiliation(s)
| | | | - Weitian Chen
- BGI, Shenzhen 518000, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen 518083, China
| | | | - Ying Zhang
- Guangdong Provincial Reproductive Science Institute (Guangdong Provincial Fertility Hospital), Guangzhou, China; NHC Key Laboratory of Male Reproduction and Genetics, Guangzhou, China
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Dong X, Du MRM, Gouil Q, Tian L, Jabbari JS, Bowden R, Baldoni PL, Chen Y, Smyth GK, Amarasinghe SL, Law CW, Ritchie ME. Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nat Methods 2023; 20:1810-1821. [PMID: 37783886 DOI: 10.1038/s41592-023-02026-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/25/2023] [Indexed: 10/04/2023]
Abstract
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
Collapse
Affiliation(s)
- Xueyi Dong
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| | - Mei R M Du
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Quentin Gouil
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Luyi Tian
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- Guangzhou National Laboratory, Guangzhou, China
| | - Jafar S Jabbari
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Rory Bowden
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Pedro L Baldoni
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Yunshun Chen
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Shanika L Amarasinghe
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- The Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Charity W Law
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Matthew E Ritchie
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
7
|
Li J, Xiao Z, Wang D, Jia L, Nie S, Zeng X, Hu W. The screening, identification, design and clinical application of tumor-specific neoantigens for TCR-T cells. Mol Cancer 2023; 22:141. [PMID: 37649123 PMCID: PMC10466891 DOI: 10.1186/s12943-023-01844-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
Recent advances in neoantigen research have accelerated the development of tumor immunotherapies, including adoptive cell therapies (ACTs), cancer vaccines and antibody-based therapies, particularly for solid tumors. With the development of next-generation sequencing and bioinformatics technology, the rapid identification and prediction of tumor-specific antigens (TSAs) has become possible. Compared with tumor-associated antigens (TAAs), highly immunogenic TSAs provide new targets for personalized tumor immunotherapy and can be used as prospective indicators for predicting tumor patient survival, prognosis, and immune checkpoint blockade response. Here, the identification and characterization of neoantigens and the clinical application of neoantigen-based TCR-T immunotherapy strategies are summarized, and the current status, inherent challenges, and clinical translational potential of these strategies are discussed.
Collapse
Affiliation(s)
- Jiangping Li
- Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, People's Republic of China.
| | - Zhiwen Xiao
- Department of Otolaryngology Head and Neck Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510655, People's Republic of China
| | - Donghui Wang
- Department of Radiation Oncology, The Third Affiliated Hospital Sun Yat-Sen University, Guangzhou, 510630, People's Republic of China
| | - Lei Jia
- International Health Medicine Innovation Center, Shenzhen University, Shenzhen, 518060, People's Republic of China
| | - Shihong Nie
- Department of Radiation Oncology, West China Hospital, Sichuan University, Cancer Center, Chengdu, 610041, People's Republic of China
| | - Xingda Zeng
- Department of Parasitology of Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
| | - Wei Hu
- Division of Vascular Surgery, Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610072, People's Republic of China
| |
Collapse
|
8
|
Deng DZQ, Verhage J, Neudorf C, Corbett-Detig R, Mekonen H, Castaldi PJ, Vollmers C. R2C2+UMI: Combining concatemeric consensus sequencing with unique molecular identifiers enables ultra-accurate sequencing of amplicons on Oxford Nanopore Technologies sequencers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.19.553937. [PMID: 37662385 PMCID: PMC10473586 DOI: 10.1101/2023.08.19.553937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
The sequencing of PCR amplicons is a core application of high-throughput sequencing technology. Using unique molecular identifiers (UMIs), individual amplified molecules can be sequenced to very high accuracy on an Illumina sequencer. However, Illumina sequencers have limited read length and are therefore restricted to sequencing amplicons shorter than 600bp unless using inefficient synthetic long-read approaches. Native long-read sequencers from Pacific Biosciences and Oxford Nanopore Technologies can, using consensus read approaches, match or exceed Illumina quality while achieving much longer read lengths. Using a circularization-based concatemeric consensus sequencing approach (R2C2) paired with UMIs (R2C2+UMI) we show that we can sequence ~550nt antibody heavy-chain (IGH) and ~1500nt 16S amplicons at accuracies up to and exceeding Q50 (<1 error in 100,0000 sequenced bases), which exceeds accuracies of UMI-supported Illumina paired sequencing as well as synthetic long-read approaches.
Collapse
Affiliation(s)
- Dori Z Q Deng
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California, USA
| | - Jack Verhage
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Celine Neudorf
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Honey Mekonen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
- Current address: Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA,USA
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| |
Collapse
|
9
|
Volden R, Schimke KD, Byrne A, Dubocanin D, Adams M, Vollmers C. Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion. Genome Biol 2023; 24:167. [PMID: 37461039 DOI: 10.1186/s13059-023-02999-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 06/28/2023] [Indexed: 07/20/2023] Open
Abstract
In this manuscript, we introduce and benchmark Mandalorion v4.1 for the identification and quantification of full-length transcriptome sequencing reads. It further improves upon the already strong performance of Mandalorion v3.6 used in the LRGASP consortium challenge. By processing real and simulated data, we show three main features of Mandalorion: first, Mandalorion-based isoform identification has very high precision and maintains high recall even in the absence of any genome annotation. Second, isoform read counts as quantified by Mandalorion show a high correlation with simulated read counts. Third, isoforms identified by Mandalorion closely reflect the full-length transcriptome sequencing data sets they are based on.
Collapse
Affiliation(s)
- Roger Volden
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Present Address: Pacific Biosciences, Menlo Park, CA, 94025, USA
| | - Kayla D Schimke
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Ashley Byrne
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
- Present Address: Genentech, San Francisco, CA, 94080, USA
| | - Danilo Dubocanin
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Matthew Adams
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, USA.
| |
Collapse
|
10
|
Petri AJ, Sahlin K. isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics 2023; 39:i222-i231. [PMID: 37387174 PMCID: PMC10311309 DOI: 10.1093/bioinformatics/btad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION With advances in long-read transcriptome sequencing, we can now fully sequence transcripts, which greatly improves our ability to study transcription processes. A popular long-read transcriptome sequencing technique is Oxford Nanopore Technologies (ONT), which through its cost-effective sequencing and high throughput, has the potential to characterize the transcriptome in a cell. However, due to transcript variability and sequencing errors, long cDNA reads need substantial bioinformatic processing to produce a set of isoform predictions from the reads. Several genome and annotation-based methods exist to produce transcript predictions. However, such methods require high-quality genomes and annotations and are limited by the accuracy of long-read splice aligners. In addition, gene families with high heterogeneity may not be well represented by a reference genome and would benefit from reference-free analysis. Reference-free methods to predict transcripts from ONT, such as RATTLE, exist, but their sensitivity is not comparable to reference-based approaches. RESULTS We present isONform, a high-sensitivity algorithm to construct isoforms from ONT cDNA sequencing data. The algorithm is based on iterative bubble popping on gene graphs built from fuzzy seeds from the reads. Using simulated, synthetic, and biological ONT cDNA data, we show that isONform has substantially higher sensitivity than RATTLE albeit with some loss in precision. On biological data, we show that isONform's predictions have substantially higher consistency with the annotation-based method StringTie2 compared with RATTLE. We believe isONform can be used both for isoform construction for organisms without well-annotated genomes and as an orthogonal method to verify predictions of reference-based methods. AVAILABILITY AND IMPLEMENTATION https://github.com/aljpetri/isONform.
Collapse
Affiliation(s)
- Alexander J Petri
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| | - Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, Stockholm 106 91, Sweden
| |
Collapse
|
11
|
Joglekar A, Foord C, Jarroux J, Pollard S, Tilgner HU. From words to complete phrases: insight into single-cell isoforms using short and long reads. Transcription 2023; 14:92-104. [PMID: 37314295 PMCID: PMC10807471 DOI: 10.1080/21541264.2023.2213514] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 04/24/2023] [Accepted: 05/07/2023] [Indexed: 06/15/2023] Open
Abstract
The profiling of gene expression patterns to glean biological insights from single cells has become commonplace over the last few years. However, this approach overlooks the transcript contents that can differ between individual cells and cell populations. In this review, we describe early work in the field of single-cell short-read sequencing as well as full-length isoforms from single cells. We then describe recent work in single-cell long-read sequencing wherein some transcript elements have been observed to work in tandem. Based on earlier work in bulk tissue, we motivate the study of combination patterns of other RNA variables. Given that we are still blind to some aspects of isoform biology, we suggest possible future avenues such as CRISPR screens which can further illuminate the function of RNA variables in distinct cell populations.
Collapse
Affiliation(s)
- Anoushka Joglekar
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Careen Foord
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Julien Jarroux
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Shaun Pollard
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Hagen U Tilgner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
12
|
Murray A, Vollmers C, Schmitz RJ. Smar2C2: A Simple and Efficient Protocol for the Identification of Transcription Start Sites. Curr Protoc 2023; 3:e705. [PMID: 36947693 DOI: 10.1002/cpz1.705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]
Abstract
Promoters and the noncoding sequences that drive their function are fundamental aspects of genes that are critical to their regulation. The transcription preinitiation complex binds and assembles on promoters where it facilitates transcription. The transcription start site (TSS) is located downstream of the promoter sequence and is defined as the location in the genome where polymerase begins transcribing DNA into RNA. Knowing the location of TSSs is useful for annotation of genes, identification of non-coding sequences important to gene regulation, detection of alternative TSSs, and understanding of 5' UTR content. Several existing techniques make it possible to accurately identify TSSs, but are often difficult to perform experimentally, require large amounts of input RNA, or are unable to identify a large number of TSSs from a single sample. Many of these protocols take advantage of template switching reverse transcriptases (TSRTs), which reliably place an adaptor at the 5' end of a first strand synthesis of cDNA. Here, we introduce a protocol that exploits TSRT activity combined with rolling circle amplification to identify TSSs with several unique advantages over existing methods. Sequence adaptors are placed on the 5' and 3' end of the full-length cDNA copy of a transcript. A splint compatible with those adaptors is then used to circularize the full-length cDNA. Linear DNA containing concatemers of the cDNA are generated using rolling circle amplification, and a sequencing library is formed by fragmenting the concatemers. This protocol is straightforward to execute, requiring limited bench time with relatively stable reagents. Using extremely low amounts of RNA input, this protocol produces large numbers of accurate, deduplicated TSSs genome wide. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Splint generation Basic Protocol 2: RNA extraction Basic Protocol 3: cDNA synthesis Basic Protocol 4: cDNA circularization and amplification Basic Protocol 5: Library generation.
Collapse
Affiliation(s)
- Andrew Murray
- Department of Plant Biology, University of Georgia, Athens, Georgia
| | - Christopher Vollmers
- Deparment of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California
| | | |
Collapse
|
13
|
Bruijnesteijn J. HLA/MHC and KIR characterization in humans and non-human primates using Oxford Nanopore Technologies and Pacific Biosciences sequencing platforms. HLA 2023; 101:205-221. [PMID: 36583332 DOI: 10.1111/tan.14957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 12/12/2022] [Accepted: 12/28/2022] [Indexed: 12/31/2022]
Abstract
The gene products of the HLA/MHC and KIR multigene families are important modulators of the immune system and are associated with health and disease. Characterization of the genes encoding these receptors has been integrated into different biomedical applications, including transplantation and reproduction biology, immune therapies and in fundamental research into disease susceptibility or resistance. Conventional short-read sequencing strategies have shown their value in high throughput typing, but are insufficient to uncover the entire complexity of the highly polymorphic HLA/MHC and KIR gene systems. The implementation of single-molecule and real-time sequencing platforms, offered by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), revolutionized the fields of genomics and transcriptomics. Using fundamentally distinct principles, these platforms generate long-read data that can unwire the plasticity of the HLA/MHC and KIR genes, including high-resolution characterization of genes, alleles, phased haplotypes, transcription levels and epigenetics modification patterns. These insights might have profound clinical relevance, such as improved matching of donors and patients in clinical transplantation, but could also lift disease association studies to a higher level. Even more, a comprehensive characterization may refine animal models in preclinical studies. In this review, the different HLA/MHC and KIR characterization approaches using PacBio and ONT platforms are described and discussed.
Collapse
Affiliation(s)
- Jesse Bruijnesteijn
- Department of Comparative Genetics and Refinement, Biomedical Primate Research Centre, Rijswijk, The Netherlands
| |
Collapse
|
14
|
Zee A, Deng DZQ, Adams M, Schimke KD, Corbett-Detig R, Russell SL, Zhang X, Schmitz RJ, Vollmers C. Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2. Genome Res 2022; 32:2092-2106. [PMID: 36351772 PMCID: PMC9808628 DOI: 10.1101/gr.277031.122] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/21/2022] [Indexed: 11/11/2022]
Abstract
High-throughput short-read sequencing has taken on a central role in research and diagnostics. Hundreds of different assays take advantage of Illumina short-read sequencers, the predominant short-read sequencing technology available today. Although other short-read sequencing technologies exist, the ubiquity of Illumina sequencers in sequencing core facilities and the high capital costs of these technologies have limited their adoption. Among a new generation of sequencing technologies, Oxford Nanopore Technologies (ONT) holds a unique position because the ONT MinION, an error-prone long-read sequencer, is associated with little to no capital cost. Here we show that we can make short-read Illumina libraries compatible with the ONT MinION by using the rolling circle to concatemeric consensus (R2C2) method to circularize and amplify the short library molecules. This results in longer DNA molecules containing tandem repeats of the original short library molecules. This longer DNA is ideally suited for the ONT MinION, and after sequencing, the tandem repeats in the resulting raw reads can be converted into high-accuracy consensus reads with similar error rates to that of the Illumina MiSeq. We highlight this capability by producing and benchmarking RNA-seq, ChIP-seq, and regular and target-enriched Tn5 libraries. We also explore the use of this approach for rapid evaluation of sequencing library metrics by implementing a real-time analysis workflow.
Collapse
Affiliation(s)
- Alexander Zee
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Dori Z Q Deng
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Matthew Adams
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Kayla D Schimke
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Shelbi L Russell
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Xuan Zhang
- Department of Genetics, University of Georgia, Athens, Georgia 30602, USA
| | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, Georgia 30602, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA
| |
Collapse
|
15
|
Tung KF, Lin WC. TEx-MST: tissue expression profiles of MANE select transcripts. Database (Oxford) 2022; 2022:6726258. [PMID: 36170113 PMCID: PMC9518666 DOI: 10.1093/database/baac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 12/05/2022]
Abstract
Recently, a new reference transcript dataset [Matched Annotation from the NCBI and EMBL-EBI (MANE) select] was released by NCBI and EMBL-EBI to make available a new unified representative transcript for human protein-coding genes. While the main purpose of MANE project is to provide a harmonized gene and transcript information standard, there is no explicit tissue expression information about these MANE select transcripts. In this report, we tried to provide useful expression profiles of MANE select transcripts in various normal human tissues to allow further interrogation of their molecular modulations and functional significance. We obtained the new V9 transcript expression dataset from the Genotype-Tissue Expression (GTEx) web portal. This new GTEx dataset, based on a long-read sequencing platform, affords better assessment of the expression of alternative spliced transcripts. This tissue expression profiles of MANE select transcripts (TEx-MST) database not only provides the basic information of MANE select transcripts but also tissue expression profiles on alternative transcripts in protein-coding genes. Users can initiate the interrogation by gene symbol searches or by browsing the MANE genes with various criteria (such as genome locations or expression rankings). We further utilized the GENCODE biotype feature to identify the top-ranked protein-coding transcripts by choosing the most expressed protein-coding transcripts from GTEx datasets (both V8 and V9 datasets). In summary, there are 18 083 genes matched between MANE and GTEx. Among them, 13 245 MANE select transcripts matched with the top-ranked protein-coding transcripts in GTEx V9 dataset, which underlined the dominate expression of MANE select transcripts. This TEx-MST web bioinformatic database provides a visualized user interface for the normal tissue expression patterns of MANE select transcripts using the newly released GTEx dataset. Database URL: TEx-MST is available at https://texmst.ibms.sinica.edu.tw/
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
| | - Wen-chang Lin
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
- Institute of Biomedical Informatics, National Yang-Ming Chiao Tung University , Taipei 112, Taiwan, R.O.C
| |
Collapse
|
16
|
Leshkowitz D, Kedmi M, Fried Y, Pilzer D, Keren-Shaul H, Ainbinder E, Dassa B. Exploring differential exon usage via short- and long-read RNA sequencing strategies. Open Biol 2022; 12:220206. [PMID: 36168804 PMCID: PMC9516339 DOI: 10.1098/rsob.220206] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing produces various mRNAs, and thereby various protein products, from one gene, impacting a wide range of cellular activities. However, accurate reconstruction and quantification of full-length transcripts using short-reads is limited, due to their length. Long-reads sequencing technologies may provide a solution by sequencing full-length transcripts. We explored the use of both Illumina short-reads and two long Oxford Nanopore Technology (cDNA and Direct RNA) RNA-Seq reads for detecting global differential splicing during mouse embryonic stem cell differentiation, applying several bioinformatics strategies: gene-based, isoform-based and exon-based. We detected the strongest similarity among the sequencing platforms at the gene level compared to exon-based and isoform-based. Furthermore, the exon-based strategy discovered many differential exon usage (DEU) events, mostly in a platform-dependent manner and in non-differentially expressed genes. Thus, the platforms complemented each other in the ability to detect DEUs (i.e. long-reads exhibited an advantage in detecting DEUs at the UTRs, and short-reads detected more DEUs). Exons within 20 genes, detected in one or more platforms, were here validated by PCR, including key differentiation genes, such as Mdb3 and Aplp1. We provide an important analysis resource for discovering transcriptome changes during stem cell differentiation and insights for analysing such data.
Collapse
Affiliation(s)
- Dena Leshkowitz
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Merav Kedmi
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yael Fried
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - David Pilzer
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hadas Keren-Shaul
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Elena Ainbinder
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Bareket Dassa
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
17
|
Yamauchi K, Sato M, Osawa L, Matsuda S, Komiyama Y, Nakakuki N, Takada H, Katoh R, Muraoka M, Suzuki Y, Tatsumi A, Miura M, Takano S, Amemiya F, Fukasawa M, Nakayama Y, Yamaguchi T, Inoue T, Maekawa S, Enomoto N. Analysis of direct-acting antiviral-resistant hepatitis C virus haplotype diversity by single-molecule and long-read sequencing. Hepatol Commun 2022; 6:1634-1651. [PMID: 35357088 PMCID: PMC9234623 DOI: 10.1002/hep4.1929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 02/03/2022] [Accepted: 02/04/2022] [Indexed: 11/08/2022] Open
Abstract
The method of analyzing individual resistant hepatitis C virus (HCV) by a combination of haplotyping and resistance-associated substitution (RAS) has not been fully elucidated because conventional sequencing has only yielded short and fragmented viral genomes. We performed haplotype analysis of HCV mutations in 12 asunaprevir/daclatasvir treatment-failure cases using the Oxford Nanopore sequencer. This enabled single-molecule long-read sequencing using rolling circle amplification (RCA) for correction of the sequencing error. RCA of the circularized reverse-transcription polymerase chain reaction products successfully produced DNA longer than 30 kilobase pairs (kb) containing multiple tandem repeats of a target 3 kb HCV genome. The long-read sequencing of these RCA products could determine the original sequence of the target single molecule as the consensus nucleotide sequence of the tandem repeats and revealed the presence of multiple viral haplotypes with the combination of various mutations in each host. In addition to already known signature RASs, such as NS3-D168 and NS5A-L31/Y93, there were various RASs specific to a different haplotype after treatment failure. The distribution of viral haplotype changed over time; some haplotypes disappeared without acquiring resistant mutations, and other haplotypes, which were not observed before treatment, appeared after treatment. Conclusion: The combination of various mutations other than the known signature RAS was suggested to influence the kinetics of individual HCV quasispecies in the direct-acting antiviral treatment. HCV haplotype dynamic analysis will provide novel information on the role of HCV diversity within the host, which will be useful for elucidating the pathological mechanism of HCV-related diseases.
Collapse
Affiliation(s)
- Kozue Yamauchi
- Department of Gastroenterology and HepatologyFaculty of MedicineUniversity of YamanashiYamanashiJapan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Tay AP, Hamey JJ, Martyn GE, Wilson LOW, Wilkins MR. Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing. J Proteome Res 2022; 21:1628-1639. [PMID: 35612954 DOI: 10.1021/acs.jproteome.1c00968] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.
Collapse
Affiliation(s)
- Aidan P Tay
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia.,Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Joshua J Hamey
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Gabriella E Martyn
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Laurence O W Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Sydney, New South Wales 2113, Australia.,Applied Biosciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Marc R Wilkins
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales 2052, Australia
| |
Collapse
|
19
|
Shu Z, Wang L, Wang J, Zhang L, Hou X, Yan H, Wang L. Integrative Analysis of Nanopore and Illumina Sequencing Reveals Alternative Splicing Complexity in Pig Longissimus Dorsi Muscle. Front Genet 2022; 13:877646. [PMID: 35480309 PMCID: PMC9035893 DOI: 10.3389/fgene.2022.877646] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Alternative splicing (AS) is a key step in the post-transcriptional regulation of gene expression that can affect intramuscular fat (IMF). In this study, longissimus dorsi muscles from 30 pigs in high- and low- IMF groups were used to perform Oxford Nanopore Technologies (ONT) full-length sequencing and Illumina strand-specific RNA-seq. A total of 43,688 full-length transcripts were identified, with 4,322 novel genes and 30,795 novel transcripts. Using AStalavista, a total of 14,728 AS events were detected in the longissimus dorsi muscle. About 17.79% of the genes produced splicing isoforms, in which exon skipping was the most frequent AS event. By analyzing the expression differences of mRNAs and splicing isoforms, we found that differentially expressed mRNAs with splicing isoforms could participate in skeletal muscle development and fatty acid metabolism, which might determine muscle-related traits. SERBP1, MYL1, TNNT3, and TNNT1 were identified with multiple splicing isoforms, with significant differences in expression. AS events occurring in IFI6 and GADD45G may cause significant differences in gene expression. Other AS events, such as ONT.15153.3, may regulate the function of ART1 by regulating the expression of different transcripts. Moreover, co-expression and protein-protein interaction (PPI) analysis indicated that several genes (MRPL27, AAR2, PYGM, PSMD4, SCNM1, and HNRNPDL) may be related to intramuscular fat. The splicing isoforms investigated in our research provide a reference for the study of alternative splicing regulation of intramuscular fat deposition.
Collapse
|
20
|
Salama SR. The Complexity of the Mammalian Transcriptome. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1363:11-22. [PMID: 35220563 DOI: 10.1007/978-3-030-92034-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Draft genome assemblies for multiple mammalian species combined with new technologies to map transcripts from diverse RNA samples to these genomes developed in the early 2000s revealed that the mammalian transcriptome was vastly larger and more complex than previously anticipated. Efforts to comprehensively catalog the identity and features of transcripts present in a variety of species, tissues and cell lines revealed that a large fraction of the mammalian genome is transcribed in at least some settings. A large number of these transcripts encode long non-coding RNAs (lncRNAs). Many lncRNAs overlap or are anti-sense to protein coding genes and others overlap small RNAs. However, a large number are independent of any previously known mRNA or small RNA. While the functions of a majority of these lncRNAs are unknown, many appear to play roles in gene regulation. Many lncRNAs have species-specific and cell type specific expression patterns and their evolutionary origins are varied. While technological challenges have hindered getting a full picture of the diversity and transcript structure of all of the transcripts arising from lncRNA loci, new technologies including single molecule nanopore sequencing and single cell RNA sequencing promise to generate a comprehensive picture of the mammalian transcriptome.
Collapse
Affiliation(s)
- Sofie R Salama
- UC Santa Cruz Genomics Institute, Department of Biomolecular Engineering and Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
21
|
Aguiar VRC, Augusto DG, Castelli EC, Hollenbach JA, Meyer D, Nunes K, Petzl-Erler ML. An immunogenetic view of COVID-19. Genet Mol Biol 2021; 44:e20210036. [PMID: 34436508 PMCID: PMC8388242 DOI: 10.1590/1678-4685-gmb-2021-0036] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 06/12/2021] [Indexed: 02/06/2023] Open
Abstract
Meeting the challenges brought by the COVID-19 pandemic requires an interdisciplinary approach. In this context, integrating knowledge of immune function with an understanding of how genetic variation influences the nature of immunity is a key challenge. Immunogenetics can help explain the heterogeneity of susceptibility and protection to the viral infection and disease progression. Here, we review the knowledge developed so far, discussing fundamental genes for triggering the innate and adaptive immune responses associated with a viral infection, especially with the SARS-CoV-2 mechanisms. We emphasize the role of the HLA and KIR genes, discussing what has been uncovered about their role in COVID-19 and addressing methodological challenges of studying these genes. Finally, we comment on questions that arise when studying admixed populations, highlighting the case of Brazil. We argue that the interplay between immunology and an understanding of genetic associations can provide an important contribution to our knowledge of COVID-19.
Collapse
Affiliation(s)
- Vitor R. C. Aguiar
- Universidade de São Paulo, Departamento de Genética e Biologia
Evolutiva, São Paulo, SP, Brazil
| | - Danillo G. Augusto
- University of California, UCSF Weill Institute for Neurosciences,
Department of Neurology, San Francisco, CA, USA
- Universidade Federal do Paraná, Departamento de Genética, Curitiba,
PR, Brazil
| | - Erick C. Castelli
- Universidade Estadual Paulista, Faculdade de Medicina de Botucatu,
Departamento de Patologia, Botucatu, SP, Brazil
| | - Jill A. Hollenbach
- University of California, UCSF Weill Institute for Neurosciences,
Department of Neurology, San Francisco, CA, USA
| | - Diogo Meyer
- Universidade de São Paulo, Departamento de Genética e Biologia
Evolutiva, São Paulo, SP, Brazil
| | - Kelly Nunes
- Universidade de São Paulo, Departamento de Genética e Biologia
Evolutiva, São Paulo, SP, Brazil
| | | |
Collapse
|
22
|
Schulz L, Torres-Diz M, Cortés-López M, Hayer KE, Asnani M, Tasian SK, Barash Y, Sotillo E, Zarnack K, König J, Thomas-Tikhonenko A. Direct long-read RNA sequencing identifies a subset of questionable exitrons likely arising from reverse transcription artifacts. Genome Biol 2021; 22:190. [PMID: 34183059 PMCID: PMC8240250 DOI: 10.1186/s13059-021-02411-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 06/16/2021] [Indexed: 11/24/2022] Open
Abstract
Resistance to CD19-directed immunotherapies in lymphoblastic leukemia has been attributed, among other factors, to several aberrant CD19 pre-mRNA splicing events, including recently reported excision of a cryptic intron embedded within CD19 exon 2. While "exitrons" are known to exist in hundreds of human transcripts, we discovered, using reporter assays and direct long-read RNA sequencing (dRNA-seq), that the CD19 exitron is an artifact of reverse transcription. Extending our analysis to publicly available datasets, we identified dozens of questionable exitrons, dubbed "falsitrons," that appear only in cDNA-seq, but never in dRNA-seq. Our results highlight the importance of dRNA-seq for transcript isoform validation.
Collapse
MESH Headings
- Alternative Splicing
- Antibodies, Bispecific/pharmacology
- Antineoplastic Agents, Immunological/pharmacology
- Artifacts
- B-Lymphocytes/drug effects
- B-Lymphocytes/immunology
- B-Lymphocytes/pathology
- Base Pairing
- Base Sequence
- Cell Line, Tumor
- Datasets as Topic
- Exons
- High-Throughput Nucleotide Sequencing
- Humans
- Immunotherapy/methods
- Introns
- Models, Biological
- Nucleic Acid Conformation
- Precursor Cell Lymphoblastic Leukemia-Lymphoma/drug therapy
- Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics
- Precursor Cell Lymphoblastic Leukemia-Lymphoma/immunology
- Precursor Cell Lymphoblastic Leukemia-Lymphoma/pathology
- Protein Isoforms/chemistry
- Protein Isoforms/genetics
- Protein Isoforms/immunology
- RNA, Messenger/chemistry
- RNA, Messenger/genetics
- RNA, Messenger/immunology
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Reverse Transcription
Collapse
Affiliation(s)
- Laura Schulz
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany
| | - Manuel Torres-Diz
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | | | - Katharina E Hayer
- The Bioinformatics Group, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Mukta Asnani
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Sarah K Tasian
- Division of Oncology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Elena Sotillo
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Present address: Stanford Cancer Institute, 265 Campus Dr., Stanford, CA, 94305, USA
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS) and Faculty of Biological Sciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438, Frankfurt, Germany
| | - Julian König
- Institute of Molecular Biology (IMB), Ackermannweg 4, 55128, Mainz, Germany.
| | - Andrei Thomas-Tikhonenko
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Division of Oncology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology & Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
23
|
Lorenzi L, Chiu HS, Avila Cobos F, Gross S, Volders PJ, Cannoodt R, Nuytens J, Vanderheyden K, Anckaert J, Lefever S, Tay AP, de Bony EJ, Trypsteen W, Gysens F, Vromman M, Goovaerts T, Hansen TB, Kuersten S, Nijs N, Taghon T, Vermaelen K, Bracke KR, Saeys Y, De Meyer T, Deshpande NP, Anande G, Chen TW, Wilkins MR, Unnikrishnan A, De Preter K, Kjems J, Koster J, Schroth GP, Vandesompele J, Sumazin P, Mestdagh P. The RNA Atlas expands the catalog of human non-coding RNAs. Nat Biotechnol 2021. [PMID: 34140680 DOI: 10.1038/s41587-021-00936–1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.
Collapse
Affiliation(s)
- Lucia Lorenzi
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Hua-Sheng Chiu
- Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Francisco Avila Cobos
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | | | - Pieter-Jan Volders
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Robrecht Cannoodt
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium.,Data Intuitive, Lebbeke, Belgium
| | - Justine Nuytens
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Katrien Vanderheyden
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Jasper Anckaert
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Steve Lefever
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Aidan P Tay
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney NSW, Australia.,Department of Biomedical Sciences, Macquarie University, New South Wales, Sydney NSW, Australia
| | - Eric J de Bony
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Wim Trypsteen
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Fien Gysens
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Marieke Vromman
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Tine Goovaerts
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Thomas Birkballe Hansen
- Interdisciplinary Nanoscience Centre (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | | | | | - Tom Taghon
- Department of Diagnostic Sciences, Ghent University, Ghent, Belgium
| | - Karim Vermaelen
- Department of Respiratory Medicine, Ghent University, Ghent, Belgium
| | - Ken R Bracke
- Department of Respiratory Medicine, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
| | - Tim De Meyer
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Nandan P Deshpande
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney NSW, Australia
| | - Govardhan Anande
- Adult Cancer Program, Lowy Cancer Research Centre, UNSW Sydney, Sydney NSW, Australia.,Prince of Wales Clinical School, UNSW Sydney, Sydney NSW, Australia
| | - Ting-Wen Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Marc R Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney NSW, Australia
| | - Ashwin Unnikrishnan
- Adult Cancer Program, Lowy Cancer Research Centre, UNSW Sydney, Sydney NSW, Australia.,Prince of Wales Clinical School, UNSW Sydney, Sydney NSW, Australia
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Jørgen Kjems
- Interdisciplinary Nanoscience Centre (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Jan Koster
- Department of Oncogenomics, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | | | - Jo Vandesompele
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Pavel Sumazin
- Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, USA.
| | - Pieter Mestdagh
- Center for Medical Genetics, Ghent University, Ghent, Belgium. .,Cancer Research Institute Ghent (CRIG), Ghent, Belgium.
| |
Collapse
|
24
|
Lorenzi L, Chiu HS, Avila Cobos F, Gross S, Volders PJ, Cannoodt R, Nuytens J, Vanderheyden K, Anckaert J, Lefever S, Tay AP, de Bony EJ, Trypsteen W, Gysens F, Vromman M, Goovaerts T, Hansen TB, Kuersten S, Nijs N, Taghon T, Vermaelen K, Bracke KR, Saeys Y, De Meyer T, Deshpande NP, Anande G, Chen TW, Wilkins MR, Unnikrishnan A, De Preter K, Kjems J, Koster J, Schroth GP, Vandesompele J, Sumazin P, Mestdagh P. The RNA Atlas expands the catalog of human non-coding RNAs. Nat Biotechnol 2021; 39:1453-1465. [PMID: 34140680 DOI: 10.1038/s41587-021-00936-1] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 04/26/2021] [Indexed: 12/24/2022]
Abstract
Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.
Collapse
Affiliation(s)
- Lucia Lorenzi
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Hua-Sheng Chiu
- Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Francisco Avila Cobos
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | | | - Pieter-Jan Volders
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Robrecht Cannoodt
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium.,Data Intuitive, Lebbeke, Belgium
| | - Justine Nuytens
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Katrien Vanderheyden
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Jasper Anckaert
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Steve Lefever
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Aidan P Tay
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney NSW, Australia.,Department of Biomedical Sciences, Macquarie University, New South Wales, Sydney NSW, Australia
| | - Eric J de Bony
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Wim Trypsteen
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Fien Gysens
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Marieke Vromman
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Tine Goovaerts
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Thomas Birkballe Hansen
- Interdisciplinary Nanoscience Centre (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | | | | | - Tom Taghon
- Department of Diagnostic Sciences, Ghent University, Ghent, Belgium
| | - Karim Vermaelen
- Department of Respiratory Medicine, Ghent University, Ghent, Belgium
| | - Ken R Bracke
- Department of Respiratory Medicine, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.,Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
| | - Tim De Meyer
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium.,Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Nandan P Deshpande
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney NSW, Australia
| | - Govardhan Anande
- Adult Cancer Program, Lowy Cancer Research Centre, UNSW Sydney, Sydney NSW, Australia.,Prince of Wales Clinical School, UNSW Sydney, Sydney NSW, Australia
| | - Ting-Wen Chen
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Marc R Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney NSW, Australia
| | - Ashwin Unnikrishnan
- Adult Cancer Program, Lowy Cancer Research Centre, UNSW Sydney, Sydney NSW, Australia.,Prince of Wales Clinical School, UNSW Sydney, Sydney NSW, Australia
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Jørgen Kjems
- Interdisciplinary Nanoscience Centre (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Jan Koster
- Department of Oncogenomics, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | | | - Jo Vandesompele
- Center for Medical Genetics, Ghent University, Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Pavel Sumazin
- Texas Children's Cancer Center, Baylor College of Medicine, Houston, TX, USA.
| | - Pieter Mestdagh
- Center for Medical Genetics, Ghent University, Ghent, Belgium. .,Cancer Research Institute Ghent (CRIG), Ghent, Belgium.
| |
Collapse
|
25
|
Vollmers AC, Mekonen HE, Campos S, Carpenter S, Vollmers C. Generation of an isoform-level transcriptome atlas of macrophage activation. J Biol Chem 2021; 296:100784. [PMID: 34000296 PMCID: PMC8191339 DOI: 10.1016/j.jbc.2021.100784] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/05/2021] [Accepted: 05/10/2021] [Indexed: 01/26/2023] Open
Abstract
RNA-seq is routinely used to measure gene expression changes in response to cell perturbation. Genes upregulated or downregulated following some perturbation are designated as genes of interest, and their most expressed isoform(s) would then be selected for follow-up experimentation. However, because of its need to fragment RNA molecules, RNA-seq is limited in its ability to capture gene isoforms and their expression patterns. This lack of isoform-specific data means that isoforms would be selected based on annotation databases that are incomplete, not tissue specific, or do not provide key information on expression levels. As a result, minority or nonexistent isoforms might be selected for follow-up, leading to loss in valuable resources and time. There is therefore a great need to comprehensively identify gene isoforms along with their corresponding levels of expression. Using the long-read nanopore-based R2C2 method, which does not fragment RNA molecules, we generated an Isoform-level transcriptome Atlas of Macrophage Activation that identifies full-length isoforms in primary human monocyte-derived macrophages. Macrophages are critical innate immune cells important for recognizing pathogens through binding of pathogen-associated molecular patterns to toll-like receptors, culminating in the initiation of host defense pathways. We characterized isoforms for most moderately-to-highly expressed genes in resting and toll-like receptor–activated monocyte-derived macrophages, identified isoforms differentially expressed between conditions, and validated these isoforms by RT-qPCR. We compiled these data into a user-friendly data portal within the UCSC Genome Browser (https://genome.ucsc.edu/s/vollmers/IAMA). Our atlas represents a valuable resource for innate immune research, providing unprecedented isoform information for primary human macrophages.
Collapse
Affiliation(s)
- Apple Cortez Vollmers
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California, USA
| | - Honey E Mekonen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Sophia Campos
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Susan Carpenter
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Cruz, Santa Cruz, California, USA.
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA.
| |
Collapse
|
26
|
Wang Q, Boenigk S, Boehm V, Gehring NH, Altmueller J, Dieterich C. Single cell transcriptome sequencing on the Nanopore platform with ScNapBar. RNA (NEW YORK, N.Y.) 2021; 27:rna.078154.120. [PMID: 33906975 PMCID: PMC8208055 DOI: 10.1261/rna.078154.120] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/20/2021] [Indexed: 06/12/2023]
Abstract
The current ecosystem of single cell RNA-seq platforms is rapidly expanding, but robust solutions for single cell and single molecule full- length RNA sequencing are virtually absent. A high-throughput solution that covers all aspects is necessary to study the complex life of mRNA on the single cell level. The Nanopore platform offers long read sequencing and can be integrated with the popular single cell sequencing method on the 10x Chromium platform. However, the high error-rate of Nanopore reads poses a challenge in downstream processing (e.g. for cell barcode assignment). We propose a solution to this particular problem by using a hybrid sequencing approach on Nanopore and Illumina platforms. Our software ScNapBar enables cell barcode assignment with high accuracy, especially if sequencing satura- tion is low. ScNapBar uses unique molecular identifier (UMI) or Naıve Bayes probabilistic approaches in the barcode assignment, depending on the available Illumina sequencing depth. We have benchmarked the two approaches on simulated and real Nanopore datasets. We further applied ScNapBar to pools of cells with an active or a silenced non-sense mediated RNA decay pathway. Our Nanopore read assignment distinguishes the respective cell populations and reveals characteristic nonsense-mediated mRNA decay events depending on cell status.
Collapse
Affiliation(s)
- Qi Wang
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg
| | - Sven Boenigk
- Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg
| | | | | | | | | |
Collapse
|
27
|
Liu X, Andrews MV, Skinner JP, Johanson TM, Chong MMW. A comparison of alternative mRNA splicing in the CD4 and CD8 T cell lineages. Mol Immunol 2021; 133:53-62. [PMID: 33631555 DOI: 10.1016/j.molimm.2021.02.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 01/05/2021] [Accepted: 02/08/2021] [Indexed: 12/14/2022]
Abstract
T cells can be subdivided into a number of different subsets that are defined by their distinct functions. While the specialization of different T cell subsets is partly achieved by the expression of specific genes, the overall transcriptional profiles of all T cells appear very similar. Alternative mRNA splicing is a mechanism that facilitates greater transcript/protein diversity from a limited number of genes, which may contribute to the functional specialization of distinct T cell subsets. In this study we employ a combination of short-read and long-read sequencing technologies to compare alternative mRNA splicing between the CD4 and CD8 T cell lineages. While long-read technology was effective at assembling full-length alternatively spliced transcripts, the low sequencing depth did not facilitate accurate quantitation. On the other hand, short-read technology was ineffective at assembling full-length transcripts but was highly accurate for quantifying expression. We show that integrating long-read and short-read data together achieves a more complete view of transcriptomic diversity. We found that while the overall usage of transcript isoforms was very similar between the CD4 and CD8 lineages, there were numerous alternative spliced mRNA isoforms that were preferentially used by one lineage over the other. These alternative spliced isoforms included ones with different exon usage, exon exclusion or intron inclusion, all of which are expected to significantly alter the protein sequence.
Collapse
Affiliation(s)
- Xin Liu
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Matthew V Andrews
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Jarrod P Skinner
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Timothy M Johanson
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
| | - Mark M W Chong
- St Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia; Department of Medicine (St Vincent's), The University of Melbourne, Fitzroy, Victoria, Australia.
| |
Collapse
|
28
|
Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 2021; 12:2. [PMID: 33397972 PMCID: PMC7782715 DOI: 10.1038/s41467-020-20340-8] [Citation(s) in RCA: 67] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 11/25/2020] [Indexed: 01/24/2023] Open
Abstract
Oxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This has greatly increased our ability to study the diversity of transcription mechanisms such as transcription initiation, termination, and alternative splicing. However, ONT still suffers from high error rates which have thus far limited its scope to reference-based analyses. When a reference is not available or is not a viable option due to reference-bias, error correction is a crucial step towards the reconstruction of the sequenced transcripts and downstream sequence analysis of transcripts. In this paper, we present a novel computational method to error correct ONT cDNA sequencing data, called isONcorrect. IsONcorrect is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths. We are able to obtain a median accuracy of 98.9-99.6%, demonstrating the feasibility of applying cost-effective cDNA full transcript length sequencing for reference-free transcriptome analysis.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
29
|
Robles-Espinoza CD, Mohammadi P, Bonilla X, Gutierrez-Arcelus M. Allele-specific expression: applications in cancer and technical considerations. Curr Opin Genet Dev 2021; 66:10-19. [PMID: 33383480 PMCID: PMC7985293 DOI: 10.1016/j.gde.2020.10.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/26/2020] [Accepted: 10/31/2020] [Indexed: 11/18/2022]
Abstract
Allele-specific gene expression can influence disease traits. Non-coding germline genetic variants that alter regulatory elements can cause allele-specific gene expression and contribute to cancer susceptibility. In tumors, both somatic copy number alterations and somatic single nucleotide variants have been shown to lead to allele-specific expression of genes, many of which are considered drivers of tumor growth. Here, we review recent studies revealing the pervasive presence of this phenomenon in cancer susceptibility and progression. Furthermore, we underscore the importance of careful experimental design and computational analysis for accurate allelic expression quantification and avoidance of false positives. Finally, we discuss additional methodological challenges encountered in cancer studies and in the burgeoning field of single-cell transcriptomics.
Collapse
Affiliation(s)
- Carla Daniela Robles-Espinoza
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Boulevard Juriquilla 3001, Santiago de Querétaro 76230, Mexico; Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA; Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA, USA
| | - Ximena Bonilla
- Department of Computer Science, ETH Zurich, Universitätsstr. 6, 8092 Zürich, Switzerland; Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, Lausanne 1015, Switzerland; University Hospital Zurich, Rämistrasse 100, 8091 Zürich, Switzerland
| | - Maria Gutierrez-Arcelus
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Division of Rheumatology, Inflammation and Immunity, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Division of Immunology, Department of Pediatrics, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
| |
Collapse
|
30
|
Saferali A, Xu Z, Sheynkman GM, Hersh CP, Cho MH, Silverman EK, Laederach A, Vollmers C, Castaldi PJ. Characterization of a COPD-Associated NPNT Functional Splicing Genetic Variant in Human Lung Tissue via Long-Read Sequencing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.10.20.20203927. [PMID: 33173926 PMCID: PMC7654922 DOI: 10.1101/2020.10.20.20203927] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Chronic obstructive pulmonary disease (COPD) is a leading cause of death worldwide. Genome-wide association studies (GWAS) have identified over 80 loci that are associated with COPD and emphysema, however for most of these loci the causal variant and gene are unknown. Here, we utilize lung splice quantitative trait loci (sQTL) data from the Genotype-Tissue Expression project (GTEx) and short read sequencing data from the Lung Tissue Research Consortium (LTRC) to characterize a locus in nephronectin ( NPNT ) associated with COPD case-control status and lung function. We found that the rs34712979 variant is associated with alternative splice junction use in NPNT , specifically for the junction connecting the 2nd and 4th exons (chr4:105898001-105927336) (p=4.02×10 -38 ). This association colocalized with GWAS data for COPD and lung spirometry measures with a posterior probability of 94%, indicating that the same causal genetic variants in NPNT underlie the associations with COPD risk, spirometric measures of lung function, and splicing. Investigation of NPNT short read sequencing revealed that rs34712979 creates a cryptic splice acceptor site which results in the inclusion of a 3 nucleotide exon extension, coding for a serine residue near the N-terminus of the protein. Using Oxford Nanopore Technologies (ONT) long read sequencing we identified 13 NPNT isoforms, 6 of which are predicted to be protein coding. Two of these are full length isoforms which differ only in the 3 nucleotide exon extension whose occurrence differs by genotype. Overall, our data indicate that rs34712979 modulates COPD risk and lung function by creating a novel splice acceptor which results in the inclusion of a 3 nucelotide sequence coding for a serine in the nephronectin protein sequence. Our findings implicate NPNT splicing in contributing to COPD risk, and identify a novel serine insertion in the nephronectin protein that warrants further study.
Collapse
|
31
|
Abstract
Advances in reading, writing, and editing DNA are providing unprecedented insights into the complexity of immunological systems. This combination of systems and synthetic biology methods is enabling the quantitative and precise understanding of molecular recognition in adaptive immunity, thus providing a framework for reprogramming immune responses for translational medicine. In this review, we will highlight state-of-the-art methods such as immune repertoire sequencing, immunoinformatics, and immunogenomic engineering and their application toward adaptive immunity. We showcase novel and interdisciplinary approaches that have the promise of transforming the design and breadth of molecular and cellular immunotherapies.
Collapse
Affiliation(s)
- Lucia Csepregi
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Roy A. Ehling
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Bastian Wagner
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| |
Collapse
|
32
|
Miga KH. Centromere studies in the era of 'telomere-to-telomere' genomics. Exp Cell Res 2020; 394:112127. [PMID: 32504677 DOI: 10.1016/j.yexcr.2020.112127] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 05/23/2020] [Accepted: 05/30/2020] [Indexed: 12/17/2022]
Abstract
We are entering into an exciting era of genomics where truly complete, high-quality assemblies of human chromosomes are available end-to-end, or from 'telomere-to-telomere' (T2T). This technological advance offers a new opportunity to include endogenous human centromeric regions in high-resolution, sequence-based studies. These emerging reference maps are expected to reveal a new functional landscape in the human genome, where centromere proteins, transcriptional regulation, and spatial organization can be examined with base-level resolution across different stages of development and disease. Such studies will depend on innovative assembly methods of extremely long tandem repeats (ETRs), or satellite DNAs, paired with the development of new, orthogonal validation methods to ensure accuracy and completeness. This review reflects the progress in centromere genomics, credited by recent advancements in long-read sequencing and assembly methods. In doing so, I will discuss the challenges that remain and the promise for a new period of scientific discovery for satellite DNA biology and centromere function.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, CA, 95064, USA.
| |
Collapse
|