1
|
Jang HJ, Shah NM, Maeng JH, Liang Y, Basri NL, Ge J, Qu X, Mahlokozera T, Tzeng SC, Williams RB, Moore MJ, Annamalai D, Chen JY, Lee HJ, DeSouza PA, Li D, Xing X, Kim AH, Wang T. Epigenetic therapy potentiates transposable element transcription to create tumor-enriched antigens in glioblastoma cells. Nat Genet 2024:10.1038/s41588-024-01880-x. [PMID: 39223316 DOI: 10.1038/s41588-024-01880-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 07/23/2024] [Indexed: 09/04/2024]
Abstract
Inhibiting epigenetic modulators can transcriptionally reactivate transposable elements (TEs). These TE transcripts often generate unique peptides that can serve as immunogenic antigens for immunotherapy. Here, we ask whether TEs activated by epigenetic therapy could appreciably increase the antigen repertoire in glioblastoma, an aggressive brain cancer with low mutation and neoantigen burden. We treated patient-derived primary glioblastoma stem cell lines, an astrocyte cell line and primary fibroblast cell lines with epigenetic drugs, and identified treatment-induced, TE-derived transcripts that are preferentially expressed in cancer cells. We verified that these transcripts could produce human leukocyte antigen class I-presented antigens using liquid chromatography with tandem mass spectrometry pulldown experiments. Importantly, many TEs were also transcribed, even in proliferating nontumor cell lines, after epigenetic therapy, which suggests that targeted strategies like CRISPR-mediated activation could minimize potential side effects of activating unwanted genomic regions. The results highlight both the need for caution and the promise of future translational efforts in harnessing treatment-induced TE-derived antigens for targeted immunotherapy.
Collapse
Affiliation(s)
- H Josh Jang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Nakul M Shah
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Yonghao Liang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Noah L Basri
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Jiaxin Ge
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Xuan Qu
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Tatenda Mahlokozera
- Department of Neurological Surgery, Washington University School of Medicine, St Louis, MO, USA
| | | | | | - Michael J Moore
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Devi Annamalai
- Department of Neurological Surgery, Washington University School of Medicine, St Louis, MO, USA
| | - Justin Y Chen
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Hyung Joo Lee
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Patrick A DeSouza
- Department of Neurological Surgery, Washington University School of Medicine, St Louis, MO, USA
| | - Daofeng Li
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Xiaoyun Xing
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Albert H Kim
- Department of Neurological Surgery, Washington University School of Medicine, St Louis, MO, USA.
- The Brain Tumor Center, Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA.
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
2
|
Quinones-Valdez G, Amoah K, Xiao X. Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599101. [PMID: 38915585 PMCID: PMC11195283 DOI: 10.1101/2024.06.14.599101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Genetic regulation of alternative splicing constitutes an important link between genetic variation and disease. Nonetheless, RNA splicing is regulated by both cis-acting elements and trans-acting splicing factors. Determining splicing events that are directed primarily by the cis- or trans-acting mechanisms will greatly inform our understanding of the genetic basis of disease. Here, we show that long-read RNA-seq, combined with our new method isoLASER, enables a clear segregation of cis- and trans-directed splicing events for individual samples. The genetic linkage of splicing is largely individual-specific, in stark contrast to the tissue-specific pattern of splicing profiles. Analysis of long-read RNA-seq data from human and mouse revealed thousands of cis-directed splicing events susceptible to genetic regulation. We highlight such events in the HLA genes whose analysis was challenging with short-read data. We also highlight novel cis-directed splicing events in Alzheimer's disease-relevant genes such as MAPT and BIN1. Together, the clear demarcation of cis- and trans-directed splicing paves ways for future studies of the genetic basis of disease.
Collapse
Affiliation(s)
- Giovanni Quinones-Valdez
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kofi Amoah
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
3
|
Wang L, Chen H, Zhuang Y, Chen K, Zhang C, Cai T, Yang Q, Fu H, Chen X, Chitkineni A, Wang X, Varshney RK, Zhuang W. Multiple strategies, including 6mA methylation, affecting plant alternative splicing in allopolyploid peanut. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:1681-1702. [PMID: 38294334 PMCID: PMC11123434 DOI: 10.1111/pbi.14296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 11/28/2023] [Accepted: 01/11/2024] [Indexed: 02/01/2024]
Abstract
Alternative splicing (AS), an important post-transcriptional regulation mechanism in eukaryotes, can significantly increase transcript diversity and contribute to gene expression regulation and many other complicated developmental processes. While plant gene AS events are well described, few studies have investigated the comprehensive regulation machinery of plant AS. Here, we use multi-omics to analyse peanut AS events. Using long-read isoform sequencing, 146 464 full-length non-chimeric transcripts were obtained, resulting in annotation corrections for 1782 genes and the identification of 4653 new loci. Using Iso-Seq RNA sequences, 271 776 unique splice junctions were identified, 82.49% of which were supported by transcriptome data. We characterized 50 977 polyadenylation sites for 23 262 genes, 12 369 of which had alternative polyadenylation sites. AS allows differential regulation of the same gene by miRNAs at the isoform level coupled with polyadenylation. In addition, we identified many long non-coding RNAs and fusion transcripts. There is a suppressed effect of 6mA on AS and gene expression. By analysis of chromatin structures, the genes located in the boundaries of topologically associated domains, proximal chromosomal telomere regions, inter- or intra-chromosomal loops were found to have more unique splice isoforms, higher expression, lower 6mA and more transposable elements (TEs) in their gene bodies than the other genes, indicating that chromatin interaction, 6mA and TEs play important roles in AS and gene expression. These results greatly refine the peanut genome annotation and contribute to the study of gene expression and regulation in peanuts. This work also showed AS is associated with multiple strategies for gene regulation.
Collapse
Affiliation(s)
- Lihui Wang
- Center for Legume Plant Genetics and System Biology, College of Plant ProtectionFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Hua Chen
- Center for Legume Plant Genetics and System Biology, College of AgronomyFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Yuhui Zhuang
- Center for Legume Plant Genetics and System Biology, College of Life ScienceFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Kun Chen
- Center for Legume Plant Genetics and System Biology, College of Plant ProtectionFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Chong Zhang
- Center for Legume Plant Genetics and System Biology, College of AgronomyFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Tiecheng Cai
- Center for Legume Plant Genetics and System Biology, College of AgronomyFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Qiang Yang
- Center for Legume Plant Genetics and System Biology, College of AgronomyFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Huiwen Fu
- Center for Legume Plant Genetics and System Biology, College of Plant ProtectionFujian Agriculture and Forestry UniversityFuzhouFujianChina
| | - Xiangyu Chen
- Crop Research InstituteFujian Academy of Agricultural SciencesFuzhouFujianChina
| | - Annapurna Chitkineni
- Centre for Crop & Food Innovation, State Agricultural Biotechnology CentreFood Futures Institute, Murdoch UniversityMurdochWestern AustraliaAustralia
| | - Xiyin Wang
- North China University of Science and TechnologyTangshanChina
| | - Rajeev K. Varshney
- Centre for Crop & Food Innovation, State Agricultural Biotechnology CentreFood Futures Institute, Murdoch UniversityMurdochWestern AustraliaAustralia
| | - Weijian Zhuang
- Center for Legume Plant Genetics and System Biology, College of AgronomyFujian Agriculture and Forestry UniversityFuzhouFujianChina
| |
Collapse
|
4
|
Chen BJ, Lin CH, Wu HY, Cai JJ, Chao DY. Experimental and analytical pipeline for sub-genomic RNA landscape of coronavirus by Nanopore sequencer. Microbiol Spectr 2024; 12:e0395423. [PMID: 38483513 PMCID: PMC10986531 DOI: 10.1128/spectrum.03954-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/26/2024] [Indexed: 04/06/2024] Open
Abstract
Coronaviruses (CoVs), including severe acute respiratory syndrome coronavirus 2, can infect a variety of mammalian and avian hosts with significant medical and economic consequences. During the life cycle of CoV, a coordinated series of subgenomic RNAs, including canonical subgenomic messenger RNA and non-canonical defective viral genomes (DVGs), are generated with different biological implications. Studies that adopted the Nanopore sequencer (ONT) to investigate the landscape and dynamics of viral RNA subgenomic transcriptomes applied arbitrary bioinformatics parameters without justification or experimental validation. The current study used bovine coronavirus (BCoV), which can be performed under biosafety level 2 for library construction and experimental validation using traditional colony polymerase chain reaction and Sanger sequencing. Four different ONT protocols, including RNA direct and cDNA direct sequencing with or without exonuclease treatment, were used to generate RNA transcriptomic libraries from BCoV-infected cell lysates. Through rigorously examining the k-mer, gap size, segment size, and bin size, the optimal cutoffs for the bioinformatic pipeline were determined to remove the sequence noise while keeping the informative DVG reads. The sensitivity and specificity of identifying DVG reads using the proposed pipeline can reach 82.6% and 99.6% under the k-mer size cutoff of 15. Exonuclease treatment reduced the abundance of RNA transcripts; however, it was not necessary for future library preparation. Additional recovery of clipped BCoV nucleotide sequences with experimental validation expands the landscape of the CoV discontinuous RNA transcriptome, whose biological function requires future investigation. The results of this study provide the benchmarks for library construction and bioinformatic parameters for studying the discontinuous CoV RNA transcriptome.IMPORTANCEFunctional defective viral genomic RNA, containing all the cis-acting elements required for translation or replication, may play different roles in triggering cell innate immune signaling, interfering with the canonical subgenomic messenger RNA transcription/translation or assisting in establishing persistence infection. This study does not only provide benchmarks for library construction and bioinformatic parameters for studying the discontinuous coronavirus RNA transcriptome but also reveals the complexity of the bovine coronavirus transcriptome, whose functional assays will be critical in future studies.
Collapse
Affiliation(s)
- Bo-Jia Chen
- Doctoral Program in Microbial Genomics, National Chung Hsing University and Academia Sinica, Taichung, Taiwan
| | - Ching-Hung Lin
- Graduate Institute of Veterinary Pathobiology, College of Veterinary Medicine, National Chung Hsing University, Taichung, Taiwan
| | - Hung-Yi Wu
- Graduate Institute of Veterinary Pathobiology, College of Veterinary Medicine, National Chung Hsing University, Taichung, Taiwan
| | - James J. Cai
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas, USA
| | - Day-Yu Chao
- Doctoral Program in Microbial Genomics, National Chung Hsing University and Academia Sinica, Taichung, Taiwan
- Graduate Institute of Microbiology and Public Health, College of Veterinary Medicine, National Chung Hsing University, Taichung, Taiwan
- Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan
| |
Collapse
|
5
|
Maeng JH, Jang HJ, Du AY, Tzeng SC, Wang T. Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome. Genome Res 2023; 33:2143-2155. [PMID: 38065624 PMCID: PMC10760525 DOI: 10.1101/gr.277061.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 11/13/2023] [Indexed: 01/04/2024]
Abstract
Recent studies have shown that the noncoding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts (TE transcripts) have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (long-read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we show that long-read technology significantly improves mapping of promoters with low mappability scores and that LRCAGE guarantees accurate construction of uncharacterized 5' transcript structure. Augmenting a reference proteome database with newly characterized transcripts enabled us to detect noncanonical antigens from HLA-pulldown LC-MS/MS data. Lastly, we show that epigenetic treatment increased the number of noncanonical antigens, particularly those encoded by TE transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.
Collapse
Affiliation(s)
- Ju Heon Maeng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - H Josh Jang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Alan Y Du
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Shin-Cheng Tzeng
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| |
Collapse
|
6
|
Zhao J, Li S, Xu Y, Ahmad N, Kuang B, Feng M, Wei N, Yang X. The subgenome Saccharum spontaneum contributes to sugar accumulation in sugarcane as revealed by full-length transcriptomic analysis. J Adv Res 2023; 54:1-13. [PMID: 36781019 DOI: 10.1016/j.jare.2023.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/16/2023] [Accepted: 02/03/2023] [Indexed: 02/13/2023] Open
Abstract
INTRODUCTION Modern sugarcane cultivars (Saccharum spp. hybrids) derived from crosses between S. officinarum and S. spontaneum, with high-sugar traits and excellent stress tolerance inherited respectively. However, the contribution of the S. spontaneum subgenome to sucrose accumulation is still unclear. OBJECTIVE To compensate for the absence of a high-quality reference genome, a transcriptome analysis method is needed to analyze the molecular basis of differential sucrose accumulation in sugarcane hybrids and to find clues to the contribution of the S. spontaneum subgenome to sucrose accumulation. METHODS PacBio full-length sequencing was used to complement genome annotation, followed by the identification of differential genes between the high and low sugar groups using differential alternative splicing analysis and differential expression analysis. At the subgenomic level, the factors responsible for differential sucrose accumulation were investigated from the perspective of transcriptional and post-transcriptional regulation. RESULTS A full-length transcriptome annotated at the subgenomic level was provided, complemented by 263,378 allele-defined transcript isoforms and 139,405 alternative splicing (AS) events. Differential alternative splicing (DA) analysis and differential expression (DE) analysis identified differential genes between high and low sugar groups and explained differential sucrose accumulation factors by the KEGG pathways. In some gene models, different or even opposite expression patterns of alleles from the same gene were observed, reflecting the potential evolution of these alleles toward novel functions in polyploid sugarcane. Among DA and DE genes in the sucrose source-sink complex pathway, we found some alleles encoding sucrose accumulation-related enzymes derived from the S. spontaneum subgenome were differentially expressed or had DA events between the two contrasting sugarcane hybrids. CONCLUSION Full-length transcriptomes annotated at the subgenomic level could better characterize sugarcane hybrids, and the S. spontaneum subgenome was found to contribute to sucrose accumulation.
Collapse
Affiliation(s)
- Jihan Zhao
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China; National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Sicheng Li
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China; National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Yuzhi Xu
- National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Nazir Ahmad
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China
| | - Bowen Kuang
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China; National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Mengfan Feng
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China; National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Ni Wei
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China; National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Xiping Yang
- State Key Laboratory of Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China; National Demonstration Center for Experimental Plant Science Education, College of Agriculture, Guangxi University, Nanning 530004, China.
| |
Collapse
|
7
|
Ma Y, Li J, Yu H, Teng L, Geng H, Li R, Xing R, Liu S, Li P. Comparative analysis of PacBio and ONT RNA sequencing methods for Nemopilema Nomurai venom identification. Genomics 2023; 115:110709. [PMID: 37739021 DOI: 10.1016/j.ygeno.2023.110709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/28/2023] [Accepted: 09/18/2023] [Indexed: 09/24/2023]
Abstract
Recent studies on marine organisms have made use of third-generation sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). While these specialized bioinformatics tools have different algorithmic designs and performance capabilities, they offer scalability and can be applied to various datasets. We investigated the effectiveness of PacBio and ONT RNA sequencing methods in identifying the venom of the jellyfish species Nemopilema nomurai. We conducted a detailed analysis of the sequencing data from both methods, focusing on key characteristics such as CD, alternative splicing, long-chain noncoding RNA, simple sequence repeat, transcription factor, and functional transcript annotation. Our findings indicate that ONT generally produced higher raw data quality in the transcriptome analysis, while PacBio generated longer read lengths. PacBio was found to be superior in identifying CDs and long-chain noncoding RNA, whereas ONT was more cost-effective for predicting alternative splicing events, simple sequence repeats, and transcription factors. Based on these results, we conclude that PacBio is the most specific and sensitive method for identifying venom components, while ONT is the most cost-effective method for studying venogenesis, cnidocyst (venom gland) development, and transcription of virulence genes in jellyfish. Our study has implications for future sequencing technologies in marine jellyfish, and highlights the power of full-length transcriptome analysis in discovering potential therapeutic targets for jellyfish dermatitis.
Collapse
Affiliation(s)
- Yuzhen Ma
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China
| | - Jie Li
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Huahua Yu
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China.
| | - Lichao Teng
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Geng
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rongfeng Li
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China
| | - Ronge Xing
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China
| | - Song Liu
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China
| | - Pengcheng Li
- Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China; Laboratory for Marine Drugs and Bioproducts, Pilot National Laboratory for Marine Science and Technology (Qingdao), No. 1 Wenhai Road, Qingdao 266237, China.
| |
Collapse
|
8
|
Dong X, Du MRM, Gouil Q, Tian L, Jabbari JS, Bowden R, Baldoni PL, Chen Y, Smyth GK, Amarasinghe SL, Law CW, Ritchie ME. Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nat Methods 2023; 20:1810-1821. [PMID: 37783886 DOI: 10.1038/s41592-023-02026-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/25/2023] [Indexed: 10/04/2023]
Abstract
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
Collapse
Affiliation(s)
- Xueyi Dong
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| | - Mei R M Du
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Quentin Gouil
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Luyi Tian
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- Guangzhou National Laboratory, Guangzhou, China
| | - Jafar S Jabbari
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Rory Bowden
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Pedro L Baldoni
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Yunshun Chen
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Shanika L Amarasinghe
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- The Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Charity W Law
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Matthew E Ritchie
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
9
|
Torre D, Francoeur NJ, Kalma Y, Gross Carmel I, Melo BS, Deikus G, Allette K, Flohr R, Fridrikh M, Vlachos K, Madrid K, Shah H, Wang YC, Sridhar SH, Smith ML, Eliyahu E, Azem F, Amir H, Mayshar Y, Marazzi I, Guccione E, Schadt E, Ben-Yosef D, Sebra R. Isoform-resolved transcriptome of the human preimplantation embryo. Nat Commun 2023; 14:6902. [PMID: 37903791 PMCID: PMC10616205 DOI: 10.1038/s41467-023-42558-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 10/15/2023] [Indexed: 11/01/2023] Open
Abstract
Human preimplantation development involves extensive remodeling of RNA expression and splicing. However, its transcriptome has been compiled using short-read sequencing data, which fails to capture most full-length mRNAs. Here, we generate an isoform-resolved transcriptome of early human development by performing long- and short-read RNA sequencing on 73 embryos spanning the zygote to blastocyst stages. We identify 110,212 unannotated isoforms transcribed from known genes, including highly conserved protein-coding loci and key developmental regulators. We further identify 17,964 isoforms from 5,239 unannotated genes, which are largely non-coding, primate-specific, and highly associated with transposable elements. These isoforms are widely supported by the integration of published multi-omics datasets, including single-cell 8CLC and blastoid studies. Alternative splicing and gene co-expression network analyses further reveal that embryonic genome activation is associated with splicing disruption and transient upregulation of gene modules. Together, these findings show that the human embryo transcriptome is far more complex than currently known, and will act as a valuable resource to empower future studies exploring development.
Collapse
Affiliation(s)
- Denis Torre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | | | - Yael Kalma
- Fertility and IVF Institute, Tel-Aviv Sourasky Medical Center, Affiliated to Tel Aviv University, Tel Aviv, 64239, Israel
| | - Ilana Gross Carmel
- Fertility and IVF Institute, Tel-Aviv Sourasky Medical Center, Affiliated to Tel Aviv University, Tel Aviv, 64239, Israel
| | - Betsaida S Melo
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Kimaada Allette
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ron Flohr
- Department of Cell and Developmental Biology, Sackler Faculty of Medicine, Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv, 69978, Israel
- CORAL - Center Of Regeneration and Longevity, Tel-Aviv Sourasky Medical Center, Tel Aviv, 64239, Israel
| | - Maya Fridrikh
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | | | - Kent Madrid
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Hardik Shah
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Ying-Chih Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Shwetha H Sridhar
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Melissa L Smith
- Department of Biochemistry and Molecular Genetics, University of Louisville, Louisville, KY, 40202, USA
| | - Efrat Eliyahu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Foad Azem
- Fertility and IVF Institute, Tel-Aviv Sourasky Medical Center, Affiliated to Tel Aviv University, Tel Aviv, 64239, Israel
| | - Hadar Amir
- Fertility and IVF Institute, Tel-Aviv Sourasky Medical Center, Affiliated to Tel Aviv University, Tel Aviv, 64239, Israel
| | - Yoav Mayshar
- Department of Molecular Cell Biology, Weizmann Institute of Science, 7610001, Rehovot, Israel
| | - Ivan Marazzi
- Department of Biological Chemistry, Center for Epigenetics and Metabolism, University of California, Irvine, CA, 92697, USA
| | - Ernesto Guccione
- Center for OncoGenomics and Innovative Therapeutics (COGIT); Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Dalit Ben-Yosef
- Fertility and IVF Institute, Tel-Aviv Sourasky Medical Center, Affiliated to Tel Aviv University, Tel Aviv, 64239, Israel.
- Department of Cell and Developmental Biology, Sackler Faculty of Medicine, Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv, 69978, Israel.
- CORAL - Center Of Regeneration and Longevity, Tel-Aviv Sourasky Medical Center, Tel Aviv, 64239, Israel.
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
10
|
Hitz BC, Lee JW, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. The ENCODE Uniform Analysis Pipelines. RESEARCH SQUARE 2023:rs.3.rs-3111932. [PMID: 37503119 PMCID: PMC10371165 DOI: 10.21203/rs.3.rs-3111932/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Collapse
Affiliation(s)
- Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jin-Wook Lee
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Keenan Graham
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Timothy Dreszer
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ian Whaling
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian T Lee
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Sandstrom
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Eric Rynes
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Alyssa Ingersoll
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Michael Buckley
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Daniel S Kim
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Nathan Boley
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Fairlie Reese
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Neva C Durand
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Suhas S P Rao
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Alyssa Blackburn
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Dimos Gkountaroulis
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Mahdi Sadr
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moshe Olshansky
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yossi Eliaz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dat Nguyen
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ivan Bochkov
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad Saad Shamim
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of BioSciences, Rice University, Houston, TX 77005, USA
| | - Erez Aiden
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Tom Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Simon Heath
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Hirst
- Micheal Smith Laboratories, University of British Columbia, British Columbia, Canada
| | - W James Kent
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anshul Kundaje
- Department of Genetics, Department of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ali Mortazavi
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Reese F, Williams B, Balderrama-Gutierrez G, Wyman D, Çelik MH, Rebboah E, Rezaie N, Trout D, Razavi-Mohseni M, Jiang Y, Borsari B, Morabito S, Liang HY, McGill CJ, Rahmanian S, Sakr J, Jiang S, Zeng W, Carvalho K, Weimer AK, Dionne LA, McShane A, Bedi K, Elhajjajy SI, Upchurch S, Jou J, Youngworth I, Gabdank I, Sud P, Jolanki O, Strattan JS, Kagda MS, Snyder MP, Hitz BC, Moore JE, Weng Z, Bennett D, Reinholdt L, Ljungman M, Beer MA, Gerstein MB, Pachter L, Guigó R, Wold BJ, Mortazavi A. The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.15.540865. [PMID: 37292896 PMCID: PMC10245583 DOI: 10.1101/2023.05.15.540865] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
Collapse
Affiliation(s)
- Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Muhammed Hasan Çelik
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Narges Rezaie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Milad Razavi-Mohseni
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Samuel Morabito
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Heidi Yahan Liang
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Cassandra J McGill
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Jasmine Sakr
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA
| | - Shan Jiang
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Weihua Zeng
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Klebea Carvalho
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Annika K Weimer
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Louise A Dionne
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Ariel McShane
- Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, USA
- Department of Radiation Oncology, University of Michigan, Ann Arbor, USA
| | - Karan Bedi
- Department of Biostatistics, University of Michigan, Ann Arbor, USA
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
| | - Shaimae I Elhajjajy
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Sean Upchurch
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ingrid Youngworth
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Ben C Hitz
- Department of Genetics, Stanford University School of Medicine, Palo Alto, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA
| | - David Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, USA
| | - Laura Reinholdt
- The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA
| | - Mats Ljungman
- Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA
- Departments of Radiation Oncology and Environmental Health Sciences, University of Michigan, Ann Arbor, USA
| | - Michael A Beer
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, USA
- Department of Statistics and Data Science, Yale University, New Haven, USA
- Department of Computer Science, Yale University, New Haven, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Barbara J Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| |
Collapse
|
12
|
Hitz BC, Jin-Wook L, Jolanki O, Kagda MS, Graham K, Sud P, Gabdank I, Strattan JS, Sloan CA, Dreszer T, Rowe LD, Podduturi NR, Malladi VS, Chan ET, Davidson JM, Ho M, Miyasato S, Simison M, Tanaka F, Luo Y, Whaling I, Hong EL, Lee BT, Sandstrom R, Rynes E, Nelson J, Nishida A, Ingersoll A, Buckley M, Frerker M, Kim DS, Boley N, Trout D, Dobin A, Rahmanian S, Wyman D, Balderrama-Gutierrez G, Reese F, Durand NC, Dudchenko O, Weisz D, Rao SSP, Blackburn A, Gkountaroulis D, Sadr M, Olshansky M, Eliaz Y, Nguyen D, Bochkov I, Shamim MS, Mahajan R, Aiden E, Gingeras T, Heath S, Hirst M, Kent WJ, Kundaje A, Mortazavi A, Wold B, Cherry JM. The ENCODE Uniform Analysis Pipelines. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535623. [PMID: 37066421 PMCID: PMC10104020 DOI: 10.1101/2023.04.04.535623] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Collapse
Affiliation(s)
- Benjamin C Hitz
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Lee Jin-Wook
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Keenan Graham
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Paul Sud
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Timothy Dreszer
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Laurence D Rowe
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Nikhil R Podduturi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Esther T Chan
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jean M Davidson
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marcus Ho
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stuart Miyasato
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matt Simison
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Forrest Tanaka
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yunhai Luo
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ian Whaling
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Eurie L Hong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Brian T Lee
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Richard Sandstrom
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Eric Rynes
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Jemma Nelson
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Andrew Nishida
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Alyssa Ingersoll
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Michael Buckley
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Mark Frerker
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, 6th Floor, Seattle, WA 98121, USA
| | - Daniel S Kim
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Nathan Boley
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sorena Rahmanian
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Dana Wyman
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Fairlie Reese
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Neva C Durand
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Computer Science, Rice University, Houston, TX 77030, USA
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Suhas S P Rao
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Medicine, University of California San Francisco, San Francisco, CA 94143, USA
| | - Alyssa Blackburn
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Dimos Gkountaroulis
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Mahdi Sadr
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moshe Olshansky
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yossi Eliaz
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dat Nguyen
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ivan Bochkov
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad Saad Shamim
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
- Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
- Department of BioSciences, Rice University, Houston, TX 77005, USA
| | - Erez Aiden
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Tom Gingeras
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Simon Heath
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Martin Hirst
- Micheal Smith Laboratories, University of British Columbia, British Columbia, Canada
| | - W James Kent
- Genomics Institute, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anshul Kundaje
- Dept. of Genetics, Dept. of Computer Science, Stanford University, 240 Pasteur Drive, Palo Alto, CA 94304, USA
| | - Ali Mortazavi
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125 USA
| | - J Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
13
|
Spatial transcriptomics reveals niche-specific enrichment and vulnerabilities of radial glial stem-like cells in malignant gliomas. Nat Commun 2023; 14:1028. [PMID: 36823172 PMCID: PMC9950149 DOI: 10.1038/s41467-023-36707-6] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 02/14/2023] [Indexed: 02/25/2023] Open
Abstract
Diffuse midline glioma-H3K27M mutant (DMG) and glioblastoma (GBM) are the most lethal brain tumors that primarily occur in pediatric and adult patients, respectively. Both tumors exhibit significant heterogeneity, shaped by distinct genetic/epigenetic drivers, transcriptional programs including RNA splicing, and microenvironmental cues in glioma niches. However, the spatial organization of cellular states and niche-specific regulatory programs remain to be investigated. Here, we perform a spatial profiling of DMG and GBM combining short- and long-read spatial transcriptomics, and single-cell transcriptomic datasets. We identify clinically relevant transcriptional programs, RNA isoform diversity, and multi-cellular ecosystems across different glioma niches. We find that while the tumor core enriches for oligodendrocyte precursor-like cells, radial glial stem-like (RG-like) cells are enriched in the neuron-rich invasive niche in both DMG and GBM. Further, we identify niche-specific regulatory programs for RG-like cells, and functionally confirm that FAM20C mediates invasive growth of RG-like cells in a neuron-rich microenvironment in a human neural stem cell derived orthotopic DMG model. Together, our results provide a blueprint for understanding the spatial architecture and niche-specific vulnerabilities of DMG and GBM.
Collapse
|
14
|
Abstract
Microchromosomes are prevalent in nonmammalian vertebrates [P. D. Waters et al., Proc. Natl. Acad. Sci. U.S.A. 118 (2021)], but a few of them are missing in bird genome assemblies. Here, we present a new chicken reference genome containing all autosomes, a Z and a W chromosome, with all gaps closed except for the W. We identified ten small microchromosomes (termed dot chromosomes) with distinct sequence and epigenetic features, among which six were newly assembled. Those dot chromosomes exhibit extremely high GC content and a high level of DNA methylation and are enriched for housekeeping genes. The pericentromeric heterochromatin of dot chromosomes is disproportionately large and continues to expand with the proliferation of satellite DNA and testis-expressed genes. Our analyses revealed that the 41-bp CNM repeat frequently forms higher-order repeats (HORs) at the centromeres of acrocentric chromosomes. The centromere core regions where the kinetochore attaches often encompass telomeric sequence (TTAGGG)n, and in a one of the dot chromosomes, the centromere core recruits an endogenous retrovirus (ERV). We further demonstrate that the W chromosome shares some common features with dot chromosomes, having large arrays of hypermethylated tandem repeats. Finally, using the complete chicken chromosome models, we reconstructed a fine picture of chordate karyotype evolution, revealing frequent chromosomal fusions before and after vertebrate whole-genome duplications. Our sequence and epigenetic characterization of chicken chromosomes shed insights into the understanding of vertebrate genome evolution and chromosome biology.
Collapse
|
15
|
Tran KM, Kawauchi S, Kramár EA, Rezaie N, Liang HY, Sakr JS, Gomez-Arboledas A, Arreola MA, Cunha CD, Phan J, Wang S, Collins S, Walker A, Shi KX, Neumann J, Filimban G, Shi Z, Milinkeviciute G, Javonillo DI, Tran K, Gantuz M, Forner S, Swarup V, Tenner AJ, LaFerla FM, Wood MA, Mortazavi A, MacGregor GR, Green KN. A Trem2 R47H mouse model without cryptic splicing drives age- and disease-dependent tissue damage and synaptic loss in response to plaques. Mol Neurodegener 2023; 18:12. [PMID: 36803190 PMCID: PMC9938579 DOI: 10.1186/s13024-023-00598-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 01/19/2023] [Indexed: 02/19/2023] Open
Abstract
BACKGROUND The TREM2 R47H variant is one of the strongest genetic risk factors for late-onset Alzheimer's Disease (AD). Unfortunately, many current Trem2 R47H mouse models are associated with cryptic mRNA splicing of the mutant allele that produces a confounding reduction in protein product. To overcome this issue, we developed the Trem2R47H NSS (Normal Splice Site) mouse model in which the Trem2 allele is expressed at a similar level to the wild-type Trem2 allele without evidence of cryptic splicing products. METHODS Trem2R47H NSS mice were treated with the demyelinating agent cuprizone, or crossed with the 5xFAD mouse model of amyloidosis, to explore the impact of the TREM2 R47H variant on inflammatory responses to demyelination, plaque development, and the brain's response to plaques. RESULTS Trem2R47H NSS mice display an appropriate inflammatory response to cuprizone challenge, and do not recapitulate the null allele in terms of impeded inflammatory responses to demyelination. Utilizing the 5xFAD mouse model, we report age- and disease-dependent changes in Trem2R47H NSS mice in response to development of AD-like pathology. At an early (4-month-old) disease stage, hemizygous 5xFAD/homozygous Trem2R47H NSS (5xFAD/Trem2R47H NSS) mice have reduced size and number of microglia that display impaired interaction with plaques compared to microglia in age-matched 5xFAD hemizygous controls. This is associated with a suppressed inflammatory response but increased dystrophic neurites and axonal damage as measured by plasma neurofilament light chain (NfL) level. Homozygosity for Trem2R47H NSS suppressed LTP deficits and loss of presynaptic puncta caused by the 5xFAD transgene array in 4-month-old mice. At a more advanced (12-month-old) disease stage 5xFAD/Trem2R47H NSS mice no longer display impaired plaque-microglia interaction or suppressed inflammatory gene expression, although NfL levels remain elevated, and a unique interferon-related gene expression signature is seen. Twelve-month old Trem2R47H NSS mice also display LTP deficits and postsynaptic loss. CONCLUSIONS The Trem2R47H NSS mouse is a valuable model that can be used to investigate age-dependent effects of the AD-risk R47H mutation on TREM2 and microglial function including its effects on plaque development, microglial-plaque interaction, production of a unique interferon signature and associated tissue damage.
Collapse
Affiliation(s)
- Kristine M. Tran
- Department of Neurobiology and Behavior, University of California, Irvine, USA
| | - Shimako Kawauchi
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
| | - Enikö A. Kramár
- Department of Neurobiology and Behavior, University of California, Irvine, USA
| | - Narges Rezaie
- Department of Developmental and Cell Biology, University of California, Irvine, USA
- Center for Complex Biological Systems, Irvine, USA
| | - Heidi Yahan Liang
- Department of Developmental and Cell Biology, University of California, Irvine, USA
- Center for Complex Biological Systems, Irvine, USA
| | - Jasmine S. Sakr
- Department of Pharmaceutical Sciences, University of California, Irvine, USA
| | | | - Miguel A. Arreola
- Department of Neurobiology and Behavior, University of California, Irvine, USA
| | - Celia da Cunha
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Jimmy Phan
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Shuling Wang
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
| | - Sherilyn Collins
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
| | - Amber Walker
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
| | - Kai-Xuan Shi
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
| | - Jonathan Neumann
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
| | - Ghassan Filimban
- Department of Developmental and Cell Biology, University of California, Irvine, USA
| | - Zechuan Shi
- Department of Neurobiology and Behavior, University of California, Irvine, USA
| | - Giedre Milinkeviciute
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Dominic I. Javonillo
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Katelynn Tran
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Magdalena Gantuz
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Stefania Forner
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Vivek Swarup
- Department of Neurobiology and Behavior, University of California, Irvine, USA
- Center for Complex Biological Systems, Irvine, USA
| | - Andrea J. Tenner
- Department of Neurobiology and Behavior, University of California, Irvine, USA
- Department of Molecular Biology & Biochemistry, University of California, Irvine, USA
- Department of Pathology and Laboratory Medicine, University of California, Irvine, USA
| | - Frank M. LaFerla
- Department of Neurobiology and Behavior, University of California, Irvine, USA
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Marcelo A. Wood
- Department of Neurobiology and Behavior, University of California, Irvine, USA
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, USA
- Center for Complex Biological Systems, Irvine, USA
| | - Grant R. MacGregor
- Transgenic Mouse Facility, Office of Research, ULAR, Irvine, USA
- Department of Developmental and Cell Biology, University of California, Irvine, USA
| | - Kim N. Green
- Department of Neurobiology and Behavior, University of California, Irvine, USA
- Institute for Memory Impairments and Neurological Disorders, University of California, Irvine, USA
| |
Collapse
|
16
|
Prjibelski AD, Mikheenko A, Joglekar A, Smetanin A, Jarroux J, Lapidus AL, Tilgner HU. Accurate isoform discovery with IsoQuant using long reads. Nat Biotechnol 2023:10.1038/s41587-022-01565-y. [PMID: 36593406 DOI: 10.1038/s41587-022-01565-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/13/2022] [Indexed: 01/04/2023]
Abstract
Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant-a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.
Collapse
Affiliation(s)
- Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia. .,Department of Computer Science, University of Helsinki, Helsinki, Finland.
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Anoushka Joglekar
- Tri-Institutional Computational Biology and Medicine, Weill Cornell Medicine, New York, NY, USA.,Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | | | - Julien Jarroux
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Alla L Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Hagen U Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA. .,Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
17
|
Kovaka S, Ou S, Jenike KM, Schatz MC. Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing. Nat Methods 2023; 20:12-16. [PMID: 36635537 PMCID: PMC10068675 DOI: 10.1038/s41592-022-01716-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The year 2022 will be remembered as the turning point for accurate long-read sequencing, which now establishes the gold standard for speed and accuracy at competitive costs. We discuss the key bioinformatics techniques needed to power long reads across application areas and close with our vision for long-read sequencing over the coming years.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Shujun Ou
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Molecular Genetics, Ohio State University, Columbus, OH, USA
| | - Katharine M Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
18
|
Nelson TM, Ghosh S, Postler TS. L-RAPiT: A Cloud-Based Computing Pipeline for the Analysis of Long-Read RNA Sequencing Data. Int J Mol Sci 2022; 23:ijms232415851. [PMID: 36555493 PMCID: PMC9781625 DOI: 10.3390/ijms232415851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/07/2022] [Accepted: 12/11/2022] [Indexed: 12/15/2022] Open
Abstract
Long-read sequencing (LRS) has been adopted to meet a wide variety of research needs, ranging from the construction of novel transcriptome annotations to the rapid identification of emerging virus variants. Amongst other advantages, LRS preserves more information about RNA at the transcript level than conventional high-throughput sequencing, including far more accurate and quantitative records of splicing patterns. New studies with LRS datasets are being published at an exponential rate, generating a vast reservoir of information that can be leveraged to address a host of different research questions. However, mining such publicly available data in a tailored fashion is currently not easy, as the available software tools typically require familiarity with the command-line interface, which constitutes a significant obstacle to many researchers. Additionally, different research groups utilize different software packages to perform LRS analysis, which often prevents a direct comparison of published results across different studies. To address these challenges, we have developed the Long-Read Analysis Pipeline for Transcriptomics (L-RAPiT), a user-friendly, free pipeline requiring no dedicated computational resources or bioinformatics expertise. L-RAPiT can be implemented directly through Google Colaboratory, a system based on the open-source Jupyter notebook environment, and allows for the direct analysis of transcriptomic reads from Oxford Nanopore and PacBio LRS machines. This new pipeline enables the rapid, convenient, and standardized analysis of publicly available or newly generated LRS datasets.
Collapse
|
19
|
Ferrández-Peral L, Zhan X, Alvarez-Estape M, Chiva C, Esteller-Cucala P, García-Pérez R, Julià E, Lizano E, Fornas Ò, Sabidó E, Li Q, Marquès-Bonet T, Juan D, Zhang G. Transcriptome innovations in primates revealed by single-molecule long-read sequencing. Genome Res 2022; 32:1448-1462. [PMID: 35840341 PMCID: PMC9435740 DOI: 10.1101/gr.276395.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/12/2022] [Indexed: 11/24/2022]
Abstract
Transcriptomic diversity greatly contributes to the fundamentals of disease, lineage-specific biology, and environmental adaptation. However, much of the actual isoform repertoire contributing to shaping primate evolution remains unknown. Here, we combined deep long- and short-read sequencing complemented with mass spectrometry proteomics in a panel of lymphoblastoid cell lines (LCLs) from human, three other great apes, and rhesus macaque, producing the largest full-length isoform catalog in primates to date. Around half of the captured isoforms are not annotated in their reference genomes, significantly expanding the gene models in primates. Furthermore, our comparative analyses unveil hundreds of transcriptomic innovations and isoform usage changes related to immune function and immunological disorders. The confluence of these evolutionary innovations with signals of positive selection and their limited impact in the proteome points to changes in alternative splicing in genes involved in immune response as an important target of recent regulatory divergence in primates.
Collapse
Affiliation(s)
| | | | | | - Cristina Chiva
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | | | | | - Eva Julià
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
| | - Esther Lizano
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, 08193 Barcelona, Spain
| | - Òscar Fornas
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Eduard Sabidó
- Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Qiye Li
- BGI-Shenzhen, Shenzhen 518083, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tomàs Marquès-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, 08193 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Center for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08028 Barcelona, Spain
| | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, 08003 Barcelona, Spain
| | - Guojie Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen 2200, Denmark
- Evolutionary and Organismal Biology Research Center, School of Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
20
|
He J, Fu T, Zhang L, Wanrong Gao L, Rensel M, Remage-Healey L, White SA, Gedman G, Whitelegge J, Xiao X, Schlinger BA. Improved zebra finch brain transcriptome identifies novel proteins with sex differences. Gene 2022; 843:146803. [PMID: 35961439 DOI: 10.1016/j.gene.2022.146803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 11/30/2022]
Abstract
The zebra finch (Taeniopygia guttata), a representative oscine songbird species, has been widely studied to investigate behavioral neuroscience, most notably the neurobiological basis of vocal learning, a rare trait shared in only a few animal groups including humans. In 2019, an updated zebra finch genome annotation (bTaeGut1_v1.p) was released from the Ensembl database and is substantially more comprehensive than the first version published in 2010. In this study, we utilized the publicly available RNA-seq data generated from Illumina-based short-reads and PacBio single-molecule real-time (SMRT) long-reads to assess the bird transcriptome. To analyze the high-throughput RNA-seq data, we adopted a hybrid bioinformatic approach combining short and long-read pipelines. From our analysis, we added 220 novel genes and 8,134 transcript variants to the Ensembl annotation, and predicted a new proteome based on the refined annotation. We further validated 18 different novel proteins by using mass-spectrometry data generated from zebra finch caudal telencephalon tissue. Our results provide additional resources for future studies of zebra finches utilizing this improved bird genome annotation and proteome.
Collapse
Affiliation(s)
- Jingyan He
- Department of Integrative Biology and Physiology, University of California, Los Angeles 90095, United States
| | - Ting Fu
- Molecular, Cellular and Integrative Physiology Interdepartmental Program, University of California, Los Angeles 90095, United States
| | - Ling Zhang
- Department of Integrative Biology and Physiology, University of California, Los Angeles 90095, United States
| | - Lucy Wanrong Gao
- The Pasarow Mass Spectrometry Laboratory, The Jane and Terry Semel Institute for Neuroscience and Human Behavior, Brain Research Institute, David Geffen School of Medicine, University of California, Los Angeles 90095, United States
| | - Michelle Rensel
- The Institute for Society and Genetics, University of California, Los Angeles 90095, United States
| | - Luke Remage-Healey
- Center for Neuroendocrine Studies, Neuroscience and Behavior, 639 N. Pleasant St, Morrill IVN Neuroscience, University of Massachusetts, Amherst, MA 01003, United States
| | - Stephanie A White
- Department of Integrative Biology and Physiology, University of California, Los Angeles 90095, United States
| | - Gregory Gedman
- Department of Integrative Biology and Physiology, University of California, Los Angeles 90095, United States
| | - Julian Whitelegge
- The Pasarow Mass Spectrometry Laboratory, The Jane and Terry Semel Institute for Neuroscience and Human Behavior, Brain Research Institute, David Geffen School of Medicine, University of California, Los Angeles 90095, United States
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, University of California, Los Angeles 90095, United States
| | - Barney A Schlinger
- Department of Integrative Biology and Physiology, University of California, Los Angeles 90095, United States.
| |
Collapse
|
21
|
Kiyose H, Nakagawa H, Ono A, Aikata H, Ueno M, Hayami S, Yamaue H, Chayama K, Shimada M, Wong JH, Fujimoto A. Comprehensive analysis of full-length transcripts reveals novel splicing abnormalities and oncogenic transcripts in liver cancer. PLoS Genet 2022; 18:e1010342. [PMID: 35926060 PMCID: PMC9380957 DOI: 10.1371/journal.pgen.1010342] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/16/2022] [Accepted: 07/14/2022] [Indexed: 12/24/2022] Open
Abstract
Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. Although long-reads sequencing technologies would enable the sequencing of full-length transcripts, the data analysis is difficult. In this study, we developed an analysis pipeline named SPLICE and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC) and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 46,663 transcripts from the protein-coding genes in the HCCs and the matched non-cancerous livers, of which 5,366 (11.5%) were novel. A comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including the LINE1-MET transcript, were not found by a gene-level analysis. We also found that fusion transcripts of transposable elements and hepatitis B virus (HBV) were overexpressed in HCCs. In vitro experiments on DETs showed that LINE1-MET and HBV-human transposable elements promoted cell growth. Furthermore, fusion gene detection showed novel recurrent fusion events that were not detected in the short-reads. These results suggest the efficiency of full-length transcriptome studies and the importance of splicing variants in carcinogenesis. Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. In this study, we developed an analysis pipeline named SPLICE for long-read transcriptome sequencing and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC), and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 5,366 novel transcripts and 9,933 differentially expressed transcripts in 4,744 genes between HCCs and non-cancerous livers. An analysis of hepatitis B virus (HBV) transcripts showed that fusion transcripts of the HBV gene and human transposable elements were overexpressed in HBV-infected HCCs. We also identified fusion genes that were not found in the short-reads. These results suggest that long-reads sequencing technologies provide a fuller understanding of cancer transcripts and that our method contributes to the analysis of transcriptome sequences by such technologies.
Collapse
Affiliation(s)
- Hiroki Kiyose
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Hidewaki Nakagawa
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Atsushi Ono
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hiroshi Aikata
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Masaki Ueno
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Shinya Hayami
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Hiroki Yamaue
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Kazuaki Chayama
- Collaborative Research Laboratory of Medical Innovation, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- Research Center for Hepatology and Gastroenterology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Mihoko Shimada
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Jing Hao Wong
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
22
|
de la Rubia I, Srivastava A, Xue W, Indi JA, Carbonell-Sala S, Lagarde J, Albà MM, Eyras E. RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing. Genome Biol 2022; 23:153. [PMID: 35804393 PMCID: PMC9264490 DOI: 10.1186/s13059-022-02715-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/20/2022] [Indexed: 11/04/2022] Open
Abstract
Nanopore sequencing enables the efficient and unbiased measurement of transcriptomes. Current methods for transcript identification and quantification rely on mapping reads to a reference genome, which precludes the study of species with a partial or missing reference or the identification of disease-specific transcripts not readily identifiable from a reference. We present RATTLE, a tool to perform reference-free reconstruction and quantification of transcripts using only Nanopore reads. Using simulated data and experimental data from isoform spike-ins, human tissues, and cell lines, we show that RATTLE accurately determines transcript sequences and their abundances, and shows good scalability with the number of transcripts.
Collapse
Affiliation(s)
- Ivan de la Rubia
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Pompeu Fabra University (UPF), E08003, Barcelona, Spain
| | - Akanksha Srivastava
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Australian National University, Acton, Canberra, ACT, 2601, Australia
| | - Wenjing Xue
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Australian National University, Acton, Canberra, ACT, 2601, Australia
| | - Joel A Indi
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Universidade de Lisboa, Lisboa, Portugal
| | - Silvia Carbonell-Sala
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain.,Centre for Regulatory Genomics (CRG), E08001, Barcelona, Spain
| | - Julien Lagarde
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain.,Centre for Regulatory Genomics (CRG), E08001, Barcelona, Spain
| | - M Mar Albà
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain. .,Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain. .,Hospital del Mar Medical Research Institute (IMIM), E08001, Barcelona, Spain.
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia. .,Australian National University, Acton, Canberra, ACT, 2601, Australia. .,Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain. .,Hospital del Mar Medical Research Institute (IMIM), E08001, Barcelona, Spain.
| |
Collapse
|
23
|
Lu P, Chen D, Qi Z, Wang H, Chen Y, Wang Q, Jiang C, Xu JR, Liu H. Landscape and regulation of alternative splicing and alternative polyadenylation in a plant pathogenic fungus. THE NEW PHYTOLOGIST 2022; 235:674-689. [PMID: 35451076 DOI: 10.1111/nph.18164] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 03/30/2022] [Indexed: 06/14/2023]
Abstract
Alternative splicing (AS) and alternative polyadenylation (APA) contribute significantly to the regulation of gene expression in higher eukaryotes. Their biological impact in filamentous fungi, however, is largely unknown. Here we combine PacBio Isoform-Sequencing and strand-specific RNA-sequencing of multiple tissues and mutant characterization to reveal the landscape and regulation of AS and APA in Fusarium graminearum. We generated a transcript annotation comprising 51 617 isoforms from 17 189 genes. In total, 4997 and 11 133 genes are alternatively spliced and polyadenylated, respectively. Majority of the AS events alter coding sequences. Unexpectedly, the AS transcripts containing premature-termination codons are not sensitive to nonsense-mediated messenger RNA decay. Unlike in yeasts and animals, distal APA sites have strong signals, but proximal APA isoforms are highly expressed in F. graminearum. The 3'-end processing factors FgRNA15, FgHRP1, and FgFIP1 play roles in promoting proximal APA site usage and intron splicing. A genome-wide increase in intron inclusion and distal APA site usage and downregulation of the spliceosomal and 3'-end processing factors were observed in older and quiescent tissues, indicating intron inclusion and 3'-untranslated region lengthening as novel mechanisms in regulating aging and dormancy in fungi. This study provides new insights into the complexity and regulation of AS and APA in filamentous fungi.
Collapse
Affiliation(s)
- Ping Lu
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Daipeng Chen
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, 47907, USA
| | - Zhaomei Qi
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Haoming Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Yitong Chen
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Qinhu Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Cong Jiang
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| | - Jin-Rong Xu
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, 47907, USA
| | - Huiquan Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, 712100, China
| |
Collapse
|
24
|
Montañés JC, Huertas M, Moro SG, Blevins WR, Carmona M, Ayté J, Hidalgo E, Albà MM. Native RNA sequencing in fission yeast reveals frequent alternative splicing isoforms. Genome Res 2022; 32:1215-1227. [PMID: 35618415 PMCID: PMC9248878 DOI: 10.1101/gr.276516.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 05/09/2022] [Indexed: 11/25/2022]
Abstract
The unicellular yeast Schizosaccharomyces pombe (fission yeast) retains many of the splicing features observed in humans and is thus an excellent model to study the basic mechanisms of splicing. Nearly half the genes contain introns, but the impact of alternative splicing in gene regulation and proteome diversification remains largely unexplored. Here we leverage Oxford Nanopore Technologies native RNA sequencing (dRNA), as well as ribosome profiling data, to uncover the full range of polyadenylated transcripts and translated open reading frames. We identify 332 alternative isoforms affecting the coding sequences of 262 different genes, 97 of which occur at frequencies >20%, indicating that functional alternative splicing in S. pombe is more prevalent than previously suspected. Intron retention events make ∼80% of the cases; these events may be involved in the regulation of gene expression and, in some cases, generate novel protein isoforms, as supported by ribosome profiling data in 18 of the intron retention isoforms. One example is the rpl22 gene, in which intron retention is associated with the translation of a protein of only 13 amino acids. We also find that lowly expressed transcripts tend to have longer poly(A) tails than highly expressed transcripts, highlighting an interdependence between poly(A) tail length and transcript expression level. Finally, we discover 214 novel transcripts that are not annotated, including 158 antisense transcripts, some of which also show translation evidence. The methodologies described in this work open new opportunities to study the regulation of splicing in a simple eukaryotic model.
Collapse
Affiliation(s)
- José Carlos Montañés
- Evolutionary Genomics Group, Research Program on Biomedical Informatics, Hospital del Mar Medical Research Institute (IMIM) and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Marta Huertas
- Evolutionary Genomics Group, Research Program on Biomedical Informatics, Hospital del Mar Medical Research Institute (IMIM) and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Simone G Moro
- Evolutionary Genomics Group, Research Program on Biomedical Informatics, Hospital del Mar Medical Research Institute (IMIM) and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - William R Blevins
- Evolutionary Genomics Group, Research Program on Biomedical Informatics, Hospital del Mar Medical Research Institute (IMIM) and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Mercè Carmona
- Oxidative Stress and Cell Cycle Group, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - José Ayté
- Oxidative Stress and Cell Cycle Group, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Elena Hidalgo
- Oxidative Stress and Cell Cycle Group, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - M Mar Albà
- Evolutionary Genomics Group, Research Program on Biomedical Informatics, Hospital del Mar Medical Research Institute (IMIM) and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| |
Collapse
|
25
|
You Y, Clark MB, Shim H. NanoSplicer: Accurate identification of splice junctions using Oxford Nanopore sequencing. Bioinformatics 2022; 38:3741-3748. [PMID: 35639973 PMCID: PMC9344838 DOI: 10.1093/bioinformatics/btac359] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 04/02/2022] [Accepted: 05/24/2022] [Indexed: 11/30/2022] Open
Abstract
Motivation Long-read sequencing methods have considerable advantages for characterizing RNA isoforms. Oxford Nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilizing matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. Results We developed ‘NanoSplicer’ to identify splice junctions using raw nanopore signal (squiggles). For each splice junction, the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using (i) synthetic mRNAs with known splice junctions and (ii) biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. Availability and implementation NanoSplicer is available at https://github.com/shimlab/NanoSplicer and archived at https://doi.org/10.5281/zenodo.6403849. Data is available from ENA: ERS7273757 and ERS7273453. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yupei You
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Heejung Shim
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Melbourne, VIC, 3010, Australia
| |
Collapse
|
26
|
Wright DJ, Hall NAL, Irish N, Man AL, Glynn W, Mould A, Angeles ADL, Angiolini E, Swarbreck D, Gharbi K, Tunbridge EM, Haerty W. Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes. BMC Genomics 2022; 23:42. [PMID: 35012468 PMCID: PMC8744310 DOI: 10.1186/s12864-021-08261-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 12/15/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Alternative splicing is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of alternative splicing processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line SH-SY5Y, and to characterise isoform expression and usage across differentiation. RESULTS We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation identifying candidates for future research into state change regulation. CONCLUSIONS Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.
Collapse
Affiliation(s)
- David J Wright
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Nicola A L Hall
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Naomi Irish
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Angela L Man
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Will Glynn
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Arne Mould
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Alejandro De Los Angeles
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Emily Angiolini
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - David Swarbreck
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Karim Gharbi
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Elizabeth M Tunbridge
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Wilfried Haerty
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK.
| |
Collapse
|
27
|
Rebboah E, Reese F, Williams K, Balderrama-Gutierrez G, McGill C, Trout D, Rodriguez I, Liang H, Wold BJ, Mortazavi A. Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq. Genome Biol 2021; 22:286. [PMID: 34620214 PMCID: PMC8495978 DOI: 10.1186/s13059-021-02505-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/20/2021] [Indexed: 11/24/2022] Open
Abstract
The rise in throughput and quality of long-read sequencing should allow unambiguous identification of full-length transcript isoforms. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here we develop and characterize long-read Split-seq (LR-Split-seq), which uses combinatorial barcoding to sequence single cells with long reads. Applied to the C2C12 myogenic system, LR-split-seq associates isoforms to cell types with relative economy and design flexibility. We find widespread evidence of changing isoform expression during differentiation including alternative transcription start sites (TSS) and/or alternative internal exon usage. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells.
Collapse
Affiliation(s)
- Elisabeth Rebboah
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Fairlie Reese
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Katherine Williams
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Gabriela Balderrama-Gutierrez
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Cassandra McGill
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Diane Trout
- Division of Biology, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Isaryhia Rodriguez
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Heidi Liang
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA
| | - Barbara J Wold
- Division of Biology, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697, USA.
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
28
|
Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon. Life (Basel) 2021; 11:life11080862. [PMID: 34440606 PMCID: PMC8399832 DOI: 10.3390/life11080862] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/07/2021] [Accepted: 08/17/2021] [Indexed: 12/16/2022] Open
Abstract
With the advantages that long-read sequencing platforms such as Pacific Biosciences (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, thus there is a need to compare these long-read sequencing platforms and evaluate them for specific research questions. This study aims to compare the performance of PacBio and ONT platforms for transcriptomic analysis by utilizing transcriptome data from three different tissues (hepatopancreas, intestine, and gonads) of the juvenile black tiger shrimp, Penaeus monodon. We compared three important features: (i) main characteristics of the sequencing libraries and their alignment with the reference genome, (ii) transcript assembly features and isoform identification, and (iii) correlation of the quantification of gene expression levels for both platforms. Our analyses suggest that read-length bias and differences in sequencing throughput are highly influential factors when using long reads in transcriptome studies. These comparisons can provide a guideline when designing a transcriptome study utilizing these two long-read sequencing technologies.
Collapse
|
29
|
Dong X, Tian L, Gouil Q, Kariyawasam H, Su S, De Paoli-Iseppi R, Prawer YDJ, Clark MB, Breslin K, Iminitoff M, Blewitt ME, Law CW, Ritchie ME. The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools. NAR Genom Bioinform 2021; 3:lqab028. [PMID: 33937765 PMCID: PMC8074342 DOI: 10.1093/nargab/lqab028] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 02/26/2021] [Accepted: 03/30/2021] [Indexed: 12/12/2022] Open
Abstract
Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.
Collapse
Affiliation(s)
- Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Luyi Tian
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Hasaru Kariyawasam
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Ricardo De Paoli-Iseppi
- Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Yair David Joseph Prawer
- Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Kelsey Breslin
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Megan Iminitoff
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Marnie E Blewitt
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Charity W Law
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria 3052, Australia
| |
Collapse
|
30
|
Witte KE, Hertel O, Windmöller BA, Helweg LP, Höving AL, Knabbe C, Busche T, Greiner JFW, Kalinowski J, Noll T, Mertzlufft F, Beshay M, Pfitzenmaier J, Kaltschmidt B, Kaltschmidt C, Banz-Jansen C, Simon M. Nanopore Sequencing Reveals Global Transcriptome Signatures of Mitochondrial and Ribosomal Gene Expressions in Various Human Cancer Stem-like Cell Populations. Cancers (Basel) 2021; 13:cancers13051136. [PMID: 33800955 PMCID: PMC7962028 DOI: 10.3390/cancers13051136] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/03/2021] [Accepted: 03/04/2021] [Indexed: 12/13/2022] Open
Abstract
Simple Summary Cancer is the leading cause of death in the industrialized world. In particular, so-called cancer stem cells (CSCs) play a crucial role in disease progression, as they are known to contribute to tumor growth and metastasis. Thus, CSCs are heavily investigated in a broad range of cancers. Nevertheless, global transcriptomic profiling of CSC populations derived from different tumor types is rare. We established three CSC populations from tumors in the uterus, brain, lung, and prostate and assessed their global transcriptomes using nanopore full-length cDNA sequencing, a new technique to assess insights into global gene profile. We observed common expression in all CSCs for distinct genes encoding proteins for organelles, such as ribosomes, mitochondria, and proteasomes. Additionally, we detected high expressions of inflammation- and immunity-related genes. Conclusively, we observed high similarities between all CSCs independent of their tumor of origin, which may build the basis for identifying novel therapeutic strategies targeting CSCs. Abstract Cancer stem cells (CSCs) are crucial mediators of tumor growth, metastasis, therapy resistance, and recurrence in a broad variety of human cancers. Although their biology is increasingly investigated within the distinct types of cancer, direct comparisons of CSCs from different tumor types allowing comprehensive mechanistic insights are rarely assessed. In the present study, we isolated CSCs from endometrioid carcinomas, glioblastoma multiforme as well as adenocarcinomas of lung and prostate and assessed their global transcriptomes using full-length cDNA nanopore sequencing. Despite the expression of common CSC markers, principal component analysis showed a distinct separation of the CSC populations into three clusters independent of the specific type of tumor. However, GO-term and KEGG pathway enrichment analysis revealed upregulated genes related to ribosomal biosynthesis, the mitochondrion, oxidative phosphorylation, and glycolytic pathways, as well as the proteasome, suggesting a great extent of metabolic flexibility in CSCs. Interestingly, the GO term “NF-kB binding” was likewise found to be elevated in all investigated CSC populations. In summary, we here provide evidence for high global transcriptional similarities between CSCs from various tumors, which particularly share upregulated gene expression associated with mitochondrial and ribosomal activity. Our findings may build the basis for identifying novel therapeutic strategies targeting CSCs.
Collapse
Affiliation(s)
- Kaya E. Witte
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Correspondence: ; Tel.: +49-521-106-5629
| | - Oliver Hertel
- Department of Cell Culture Technology, Faculty of Technology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (O.H.); (T.N.)
- Center for Biotechnology-CeBiTec, University of Bielefeld, Universitätsstrasse 27, 33699 Bielefeld, Germany; (T.B.); (J.K.)
| | - Beatrice A. Windmöller
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
| | - Laureen P. Helweg
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
| | - Anna L. Höving
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Heart and Diabetes Centre NRW, Institute for Laboratory and Transfusion Medicine, Ruhr-University Bochum, 32545 Bad Oeynhausen, Germany
| | - Cornelius Knabbe
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Heart and Diabetes Centre NRW, Institute for Laboratory and Transfusion Medicine, Ruhr-University Bochum, 32545 Bad Oeynhausen, Germany
| | - Tobias Busche
- Center for Biotechnology-CeBiTec, University of Bielefeld, Universitätsstrasse 27, 33699 Bielefeld, Germany; (T.B.); (J.K.)
| | - Johannes F. W. Greiner
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
| | - Jörn Kalinowski
- Center for Biotechnology-CeBiTec, University of Bielefeld, Universitätsstrasse 27, 33699 Bielefeld, Germany; (T.B.); (J.K.)
| | - Thomas Noll
- Department of Cell Culture Technology, Faculty of Technology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (O.H.); (T.N.)
- Center for Biotechnology-CeBiTec, University of Bielefeld, Universitätsstrasse 27, 33699 Bielefeld, Germany; (T.B.); (J.K.)
| | - Fritz Mertzlufft
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Scientific Director of the Protestant Hospital of Bethel Foundation, University Medical School OWL at Bielefeld, Bielefeld University, Campus Bielefeld-Bethel, Maraweg 21, 33699 Bielefeld, Germany
| | - Morris Beshay
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Department for Thoracic Surgery and Pneumology, Protestant Hospital of Bethel Foundation, University Medical School OWL at Bielefeld, Bielefeld University, Campus Bielefeld-Bethel, Burgsteig 13, 33699 Bielefeld, Germany
| | - Jesco Pfitzenmaier
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Department of Urology and Center for Computer-Assisted and Robotic Urology, Protestant Hospital of Bethel Foundation, University Medical School OWL at Bielefeld, Bielefeld University, Campus Bielefeld-Bethel, Burgsteig 13, 33699 Bielefeld, Germany
| | - Barbara Kaltschmidt
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Molecular Neurobiology, Faculty of Biology, Bielefeld University, Universitätsstrasse 25, 33699 Bielefeld, Germany
| | - Christian Kaltschmidt
- Department of Cell Biology, Faculty of Biology, University of Bielefeld, Universitätsstrasse 25, 33699 Bielefeld, Germany; (B.A.W.); (L.P.H.); (A.L.H.); (J.F.W.G.); (B.K.); (C.K.)
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
| | - Constanze Banz-Jansen
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Department of Gynecology and Obstetrics, and Perinatal Center, Protestant Hospital of Bethel Foundation, University Medical School OWL at Bielefeld, Bielefeld University, Campus Bielefeld-Bethel, Burgsteig 13, 33699 Bielefeld, Germany
| | - Matthias Simon
- Forschungsverbund BioMedizin Bielefeld, OWL (FBMB e.V.), Maraweg 21, 33699 Bielefeld, Germany; (C.K.); (F.M.); (M.B.); (J.P.); (C.B.-J.); (M.S.)
- Department of Neurosurgery and Epilepsy Surgery, Protestant Hospital of Bethel Foundation, University Medical School OWL at Bielefeld, Bielefeld University, Campus Bielefeld-Bethel, Burgsteig 13, 33699 Bielefeld, Germany
| |
Collapse
|
31
|
Identification of Dominant Transcripts in Oxidative Stress Response by a Full-Length Transcriptome Analysis. Mol Cell Biol 2021; 41:MCB.00472-20. [PMID: 33168698 DOI: 10.1128/mcb.00472-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 11/02/2020] [Indexed: 12/30/2022] Open
Abstract
Our body responds to environmental stress by changing the expression levels of a series of cytoprotective enzymes/proteins through multilayered regulatory mechanisms, including the KEAP1-NRF2 system. While NRF2 upregulates the expression of many cytoprotective genes, there are fundamental limitations in short-read RNA sequencing (RNA-Seq), resulting in confusion regarding interpreting the effectiveness of cytoprotective gene induction at the transcript level. To precisely delineate isoform usage in the stress response, we conducted independent full-length transcriptome profiling (isoform sequencing; Iso-Seq) analyses of lymphoblastoid cells from three volunteers under normal and electrophilic stress-induced conditions. We first determined the first exon usage in KEAP1 and NFE2L2 (encoding NRF2) and found the presence of transcript diversity. We then examined changes in isoform usage of NRF2 target genes under stress conditions and identified a few isoforms dominantly expressed in the majority of NRF2 target genes. The expression levels of isoforms determined by Iso-Seq analyses showed striking differences from those determined by short-read RNA-Seq; the latter could be misleading concerning the abundance of transcripts. These results support that transcript usage is tightly regulated to produce functional proteins under electrophilic stress. Our present study strongly argues that there are important benefits that can be achieved by long-read transcriptome sequencing.
Collapse
|
32
|
Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 2021; 12:2. [PMID: 33397972 PMCID: PMC7782715 DOI: 10.1038/s41467-020-20340-8] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 11/25/2020] [Indexed: 01/24/2023] Open
Abstract
Oxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This has greatly increased our ability to study the diversity of transcription mechanisms such as transcription initiation, termination, and alternative splicing. However, ONT still suffers from high error rates which have thus far limited its scope to reference-based analyses. When a reference is not available or is not a viable option due to reference-bias, error correction is a crucial step towards the reconstruction of the sequenced transcripts and downstream sequence analysis of transcripts. In this paper, we present a novel computational method to error correct ONT cDNA sequencing data, called isONcorrect. IsONcorrect is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths. We are able to obtain a median accuracy of 98.9-99.6%, demonstrating the feasibility of applying cost-effective cDNA full transcript length sequencing for reference-free transcriptome analysis.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
33
|
Oka M, Xu L, Suzuki T, Yoshikawa T, Sakamoto H, Uemura H, Yoshizawa AC, Suzuki Y, Nakatsura T, Ishihama Y, Suzuki A, Seki M. Aberrant splicing isoforms detected by full-length transcriptome sequencing as transcripts of potential neoantigens in non-small cell lung cancer. Genome Biol 2021; 22:9. [PMID: 33397462 PMCID: PMC7780684 DOI: 10.1186/s13059-020-02240-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 12/14/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Long-read sequencing of full-length cDNAs enables the detection of structures of aberrant splicing isoforms in cancer cells. These isoforms are occasionally translated, presented by HLA molecules, and recognized as neoantigens. This study used a long-read sequencer (MinION) to construct a comprehensive catalog of aberrant splicing isoforms in non-small-cell lung cancers, by which novel isoforms and potential neoantigens are identified. RESULTS Full-length cDNA sequencing is performed using 22 cell lines, and a total of 2021 novel splicing isoforms are identified. The protein expression of some of these isoforms is then validated by proteome analysis. Ablations of a nonsense-mediated mRNA decay (NMD) factor, UPF1, and a splicing factor, SF3B1, are found to increase the proportion of aberrant transcripts. NetMHC evaluation of the binding affinities to each type of HLA molecule reveals that some of the isoforms potentially generate neoantigen candidates. We also identify aberrant splicing isoforms in seven non-small-cell lung cancer specimens. An enzyme-linked immune absorbent spot assay indicates that approximately half the peptide candidates have the potential to activate T cell responses through their interaction with HLA molecules. Finally, we estimate the number of isoforms in The Cancer Genome Atlas (TCGA) datasets by referring to the constructed catalog and found that disruption of NMD factors is significantly correlated with the number of splicing isoforms found in the TCGA-Lung Adenocarcinoma data collection. CONCLUSIONS Our results indicate that long-read sequencing of full-length cDNAs is essential for the precise identification of aberrant transcript structures in cancer cells.
Collapse
Affiliation(s)
- Miho Oka
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- Ono Pharmaceutical Co., Ltd., Ibaraki, Japan
| | - Liu Xu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Toshihiro Suzuki
- General Medical Education and Research Center, Teikyo University, Tokyo, Japan
- Division of Cancer Immunotherapy, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Chiba, Japan
| | - Toshiaki Yoshikawa
- Division of Cancer Immunotherapy, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Chiba, Japan
| | - Hiromi Sakamoto
- Department of Clinical Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Hayato Uemura
- Department of Molecular and Cellular BioAnalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | - Akiyasu C. Yoshizawa
- Department of Molecular and Cellular BioAnalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Tetsuya Nakatsura
- Division of Cancer Immunotherapy, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Chiba, Japan
| | - Yasushi Ishihama
- Department of Molecular and Cellular BioAnalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | - Ayako Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| |
Collapse
|
34
|
Abstract
Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.
Collapse
Affiliation(s)
- Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SD, United Kingdom; , ,
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SD, United Kingdom; , ,
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SD, United Kingdom; , ,
| |
Collapse
|
35
|
Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee JW, Tanaka FY, Adenekan P, O'Neill E, Cherry JM. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res 2020; 48:D882-D889. [PMID: 31713622 PMCID: PMC7061942 DOI: 10.1093/nar/gkz1062] [Citation(s) in RCA: 365] [Impact Index Per Article: 91.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/18/2019] [Accepted: 10/25/2019] [Indexed: 02/06/2023] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) is an ongoing collaborative research project aimed at identifying all the functional elements in the human and mouse genomes. Data generated by the ENCODE consortium are freely accessible at the ENCODE portal (https://www.encodeproject.org/), which is developed and maintained by the ENCODE Data Coordinating Center (DCC). Since the initial portal release in 2013, the ENCODE DCC has updated the portal to make ENCODE data more findable, accessible, interoperable and reusable. Here, we report on recent updates, including new ENCODE data and assays, ENCODE uniform data processing pipelines, new visualization tools, a dataset cart feature, unrestricted public access to ENCODE data on the cloud (Amazon Web Services open data registry, https://registry.opendata.aws/encode-project/) and more comprehensive tutorials and documentation.
Collapse
Affiliation(s)
- Yunhai Luo
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Benjamin C Hitz
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Idan Gabdank
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jason A Hilton
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Meenakshi S Kagda
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Bonita Lam
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Zachary Myers
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Paul Sud
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jennifer Jou
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Khine Lin
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | | | - Keenan Graham
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Casey Litton
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - J Seth Strattan
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Otto Jolanki
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Jin-Wook Lee
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Forrest Y Tanaka
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Philip Adenekan
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - Emma O'Neill
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5477, USA
| |
Collapse
|
36
|
The genome of the sea anemone Actinia equina (L.): Meiotic toolkit genes and the question of sexual reproduction. Mar Genomics 2020; 53:100753. [PMID: 32057717 DOI: 10.1016/j.margen.2020.100753] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 02/05/2020] [Accepted: 02/05/2020] [Indexed: 12/15/2022]
Abstract
The beadlet anemone Actinia equina (L.) (Cnidaria: Anthozoa: Actiniaria: Actiniidae) is one of the most familiar organisms of the North European intertidal zone. Once considered a single, morphologically variable species across northern Europe, it is now recognised as one member of a variable species complex. Previous studies of distribution, aggression, allozymes and mitochondrial DNA suggest that the diversity in form and colour within A. equina may hide still unrecognised species diversity. To empower further study of A. equina population genetics and systematics, we sequenced (PacBio Sequel) the genome of a single A. equina individual to produce a high-quality genome assembly (contig N50 = 492,607 bp, 1485 contigs, number of protein coding genes = 47,671, 97% BUSCO completeness). There is debate as to whether A. equina reproduces solely asexually, since no reliable, consistent evidence of sexual reproduction has been found. To gain further insight, we examined the genome for evidence of a 'meiotic toolkit' - genes believed to be found consistently in sexually reproducing organisms - and demonstrate that the A. equina genome appears not to have this full complement. Additionally, Smudgeplot analysis, coupled with high haplotype diversity, indicates this genome assembly to be of ambiguous ploidy, suggesting that A. equina may not be diploid. The suggested polyploid nature of this species coupled with the deficiency in meiotic toolkit genes, indicates that further field and laboratory studies of this species is warranted to understand how this species reproduces and what role ploidy may play in speciation within this speciose genus.
Collapse
|
37
|
Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 2019; 20:278. [PMID: 31842956 PMCID: PMC6912988 DOI: 10.1186/s13059-019-1910-1] [Citation(s) in RCA: 759] [Impact Index Per Article: 151.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/02/2019] [Indexed: 11/13/2022] Open
Abstract
RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Aleksey V. Zimin
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Geo M. Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Roham Razaghi
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Mihaela Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
38
|
Ruiz-Reche A, Srivastava A, Indi JA, de la Rubia I, Eyras E. ReorientExpress: reference-free orientation of nanopore cDNA reads with deep learning. Genome Biol 2019; 20:260. [PMID: 31783882 PMCID: PMC6883653 DOI: 10.1186/s13059-019-1884-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 11/07/2019] [Indexed: 12/18/2022] Open
Abstract
We describe ReorientExpress, a method to perform reference-free orientation of transcriptomic long sequencing reads. ReorientExpress uses deep learning to correctly predict the orientation of the majority of reads, and in particular when trained on a closely related species or in combination with read clustering. ReorientExpress enables long-read transcriptomics in non-model organisms and samples without a genome reference without using additional technologies and is available at https://github.com/comprna/reorientexpress.
Collapse
Affiliation(s)
| | - Akanksha Srivastava
- The John Curtin School of Medical, Australian National University, Acton ACT, Canberra, 2601, Australia
- EMBL Australia Partner Laboratory Network and the Australian National University, Acton ACT, Canberra, 2601, Australia
| | - Joel A Indi
- Pompeu Fabra University, E08003, Barcelona, Spain
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | | | - Eduardo Eyras
- The John Curtin School of Medical, Australian National University, Acton ACT, Canberra, 2601, Australia.
- EMBL Australia Partner Laboratory Network and the Australian National University, Acton ACT, Canberra, 2601, Australia.
- IMIM - Hospital del Mar Medical Research Institute, E08003, Barcelona, Spain.
| |
Collapse
|
39
|
Hardwick SA, Joglekar A, Flicek P, Frankish A, Tilgner HU. Getting the Entire Message: Progress in Isoform Sequencing. Front Genet 2019; 10:709. [PMID: 31475029 PMCID: PMC6706457 DOI: 10.3389/fgene.2019.00709] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/04/2019] [Indexed: 01/31/2023] Open
Abstract
The advent of second-generation sequencing and its application to RNA sequencing have revolutionized the field of genomics by allowing quantification of gene expression, as well as the definition of transcription start/end sites, exons, splice sites and RNA editing sites. However, due to the sequencing of fragments of cDNAs, these methods have not given a reliable picture of complete RNA isoforms. Third-generation sequencing has filled this gap and allows end-to-end sequencing of entire RNA/cDNA molecules. This approach to transcriptomics has been a "niche" technology for a couple of years but now is becoming mainstream with many different applications. Here, we review the background and progress made to date in this rapidly growing field. We start by reviewing the progressive realization that alternative splicing is omnipresent. We then focus on long-noncoding RNA isoforms and the distinct combination patterns of exons in noncoding and coding genes. We consider the implications of the recent technologies of direct RNA sequencing and single-cell isoform RNA sequencing. Finally, we discuss the parameters that define the success of long-read RNA sequencing experiments and strategies commonly used to make the most of such data.
Collapse
Affiliation(s)
- Simon A. Hardwick
- Brain and Mind Research Institute, Weill Cornell Medicine, NY, United States
- Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Anoushka Joglekar
- Brain and Mind Research Institute, Weill Cornell Medicine, NY, United States
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Hagen U. Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, NY, United States
| |
Collapse
|