1
|
Bessière C, Xue H, Guibert B, Boureux A, Rufflé F, Viot J, Chikhi R, Salson M, Marchet C, Commes T, Gautheret D. Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data. Genome Biol 2024; 25:266. [PMID: 39390592 PMCID: PMC11468207 DOI: 10.1186/s13059-024-03413-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 10/01/2024] [Indexed: 10/12/2024] Open
Abstract
Indexing techniques relying on k-mers have proven effective in searching for RNA sequences across thousands of RNA-seq libraries, but without enabling direct RNA quantification. We show here that arbitrary RNA sequences can be quantified in seconds through their decomposition into k-mers, with a precision akin to that of conventional RNA quantification methods. Using an index of the Cancer Cell Line Encyclopedia (CCLE) collection consisting of 1019 RNA-seq samples, we show that k-mer indexing offers a powerful means to reveal non-reference sequences, and variant RNAs induced by specific gene alterations, for instance in splicing factors.
Collapse
Affiliation(s)
- Chloé Bessière
- IRMB, INSERM U1183, Hopital Saint-Eloi, Universite de Montpellier, Montpellier, France
- CRCT, Inserm, CNRS, Université Toulouse III-Paul Sabatier, Centre de Recherches en Cancérologie de Toulouse, Toulouse, France
| | - Haoliang Xue
- I2BC, Université Paris-Saclay, CNRS, CEA, Gif sur Yvette, France
| | - Benoit Guibert
- IRMB, INSERM U1183, Hopital Saint-Eloi, Universite de Montpellier, Montpellier, France
| | - Anthony Boureux
- IRMB, INSERM U1183, Hopital Saint-Eloi, Universite de Montpellier, Montpellier, France
| | - Florence Rufflé
- IRMB, INSERM U1183, Hopital Saint-Eloi, Universite de Montpellier, Montpellier, France
| | - Julien Viot
- Department of Medical Oncology, Biotechnology and Immuno-Oncology Platform, University Hospital of Besançon, Besançon, France
- INSERM, EFS BFC, UMR1098, RIGHT, University of Franche-Comté, Interactions Greffon-Hôte-Tumeur/Ingénierie Cellulaire et Génique, Besançon, France
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, Paris, France
| | - Mikaël Salson
- Université de Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France
| | - Camille Marchet
- Université de Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000, Lille, France
| | - Thérèse Commes
- IRMB, INSERM U1183, Hopital Saint-Eloi, Universite de Montpellier, Montpellier, France.
| | - Daniel Gautheret
- I2BC, Université Paris-Saclay, CNRS, CEA, Gif sur Yvette, France.
| |
Collapse
|
2
|
Kumari P, Kaur M, Dindhoria K, Ashford B, Amarasinghe SL, Thind AS. Advances in long-read single-cell transcriptomics. Hum Genet 2024; 143:1005-1020. [PMID: 38787419 PMCID: PMC11485027 DOI: 10.1007/s00439-024-02678-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024]
Abstract
Long-read single-cell transcriptomics (scRNA-Seq) is revolutionizing the way we profile heterogeneity in disease. Traditional short-read scRNA-Seq methods are limited in their ability to provide complete transcript coverage, resolve isoforms, and identify novel transcripts. The scRNA-Seq protocols developed for long-read sequencing platforms overcome these limitations by enabling the characterization of full-length transcripts. Long-read scRNA-Seq techniques initially suffered from comparatively poor accuracy compared to short read scRNA-Seq. However, with improvements in accuracy, accessibility, and cost efficiency, long-reads are gaining popularity in the field of scRNA-Seq. This review details the advances in long-read scRNA-Seq, with an emphasis on library preparation protocols and downstream bioinformatics analysis tools.
Collapse
Affiliation(s)
- Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Manmeet Kaur
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Bruce Ashford
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia
| | - Shanika L Amarasinghe
- Monash Biomedical Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- Walter and Eliza Hall Institute of Medical Research, 1G, Royal Parade, Parkville, VIC, 3025, Australia
| | - Amarinder Singh Thind
- Illawarra Shoalhaven Local Health District (ISLHD), NSW Health, Wollongong, NSW, Australia.
- The School of Chemistry and Molecular Bioscience (SCMB), University of Wollongong, Loftus St, Wollongong, NSW, 2500, Australia.
| |
Collapse
|
3
|
Zhu XT, Sanz-Jimenez P, Ning XT, Tahir Ul Qamar M, Chen LL. Direct RNA sequencing in plants: Practical applications and future perspectives. PLANT COMMUNICATIONS 2024:101064. [PMID: 39155503 DOI: 10.1016/j.xplc.2024.101064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/17/2024] [Accepted: 08/14/2024] [Indexed: 08/20/2024]
Abstract
The transcriptome serves as a bridge that links genomic variation to phenotypic diversity. A vast number of studies using next-generation RNA sequencing (RNA-seq) over the last 2 decades have emphasized the essential roles of the plant transcriptome in response to developmental and environmental conditions, providing numerous insights into the dynamic changes, evolutionary traces, and elaborate regulation of the plant transcriptome. With substantial improvement in accuracy and throughput, direct RNA sequencing (DRS) has emerged as a new and powerful sequencing platform for precise detection of native and full-length transcripts, overcoming many limitations such as read length and PCR bias that are inherent to short-read RNA-seq. Here, we review recent advances in dissecting the complexity and diversity of plant transcriptomes using DRS as the main technological approach, covering many aspects of RNA metabolism, including novel isoforms, poly(A) tails, and RNA modification, and we propose a comprehensive workflow for processing of plant DRS data. Many challenges to the application of DRS in plants, such as the need for machine learning tools tailored to plant transcriptomes, remain to be overcome, and together we outline future biological questions that can be addressed by DRS, such as allele-specific RNA modification. This technology provides convenient support on which the connection of distinct RNA features is tightly built, sustainably refining our understanding of the biological functions of the plant transcriptome.
Collapse
Affiliation(s)
- Xi-Tong Zhu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| | - Pablo Sanz-Jimenez
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiao-Tong Ning
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Muhammad Tahir Ul Qamar
- Integrative Omics and Molecular Modeling Laboratory, Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad 38000, Pakistan
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| |
Collapse
|
4
|
Byrne A, Le D, Sereti K, Menon H, Vaidya S, Patel N, Lund J, Xavier-Magalhães A, Shi M, Liang Y, Sterne-Weiler T, Modrusan Z, Stephenson W. Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer. Nat Commun 2024; 15:6916. [PMID: 39134520 PMCID: PMC11319652 DOI: 10.1038/s41467-024-51252-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Single-cell RNA sequencing predominantly employs short-read sequencing to characterize cell types, states and dynamics; however, it is inadequate for comprehensive characterization of RNA isoforms. Long-read sequencing technologies enable single-cell RNA isoform detection but are hampered by lower throughput and unintended sequencing of artifacts. Here we develop Single-cell Targeted Isoform Long-Read Sequencing (scTaILoR-seq), a hybridization capture method which targets over a thousand genes of interest, improving the median number of on-target transcripts per cell by 29-fold. We use scTaILoR-seq to identify and quantify RNA isoforms from ovarian cancer cell lines and primary tumors, yielding 10,796 single-cell transcriptomes. Using long-read variant calling we reveal associations of expressed single nucleotide variants (SNVs) with alternative transcript structures. Phasing of SNVs across transcripts enables the measurement of allelic imbalance within distinct cell populations. Overall, scTaILoR-seq is a long-read targeted RNA sequencing method and analytical framework for exploring transcriptional variation at single-cell resolution.
Collapse
Affiliation(s)
- Ashley Byrne
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Daniel Le
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Kostianna Sereti
- Department of Discovery Oncology, Genentech, South San Francisco, CA, USA
| | - Hari Menon
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Samir Vaidya
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Neha Patel
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Jessica Lund
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Ana Xavier-Magalhães
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Minyi Shi
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Yuxin Liang
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Timothy Sterne-Weiler
- Department of Discovery Oncology, Genentech, South San Francisco, CA, USA
- Department of Oncology Bioinformatics, Genentech, South San Francisco, CA, USA
| | - Zora Modrusan
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA.
| | - William Stephenson
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA.
| |
Collapse
|
5
|
Yang L, Zhang X, Wang F, Zhang L, Li J, Yue JX. NanoTrans: an integrated computational framework for comprehensive transcriptome analysis with nanopore direct RNA sequencing. J Genet Genomics 2024:S1673-8527(24)00183-8. [PMID: 39004399 DOI: 10.1016/j.jgg.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 07/03/2024] [Accepted: 07/04/2024] [Indexed: 07/16/2024]
Abstract
Nanopore direct RNA sequencing (DRS) provides the direct access to native RNA strands with full-length information, shedding light on rich qualitative and quantitative properties of gene expression profiles. Here with NanoTrans, we present an integrated computational framework that comprehensively covers all major DRS-based application scopes, including isoform clustering and quantification, poly(A) tail length estimation, RNA modification profiling, and fusion gene detection. In addition to its merit in providing such a streamlined one-stop solution, NanoTrans also shines in its workflow-orientated modular design, batch processing capability, all-in-one tabular and graphic report output, as well as automatic installation and configuration supports. Finally, by applying NanoTrans to real DRS datasets of yeast, Arabidopsis, as well as human embryonic kidney and cancer cell lines, we further demonstrate its utility, effectiveness, and efficacy across a wide range of DRS-based application settings.
Collapse
Affiliation(s)
- Ludong Yang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Xinxin Zhang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China
| | - Fan Wang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China; Department of Medical Oncology, The Affiliated Huai'an No.1 People's Hospital of Nanjing Medical University, Huai'an, Jiangsu 223200, China
| | - Li Zhang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| | - Jing Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong 510060, China.
| |
Collapse
|
6
|
Chen S, Wang H, Zhang D, Chen R, Luo J. Readon: a novel algorithm to identify read-through transcripts with long-read sequencing data. Bioinformatics 2024; 40:btae336. [PMID: 38808568 PMCID: PMC11162696 DOI: 10.1093/bioinformatics/btae336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/30/2024] [Accepted: 05/26/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION There are many clustered transcriptionally active regions in the human genome, in which the transcription complex cannot immediately terminate transcription at the upstream gene termination site, but instead continues to transcribe intergenic regions and downstream genes, resulting in read-through transcripts. Several studies have demonstrated the regulatory roles of read-through transcripts in tumorigenesis and development. However, limited by the read length of next-generation sequencing, discovery of read-through transcripts has been slow. For long but also erroneous third-generation sequencing data, this study developed a novel minimizer sketch algorithm to accurately and quickly identify read-through transcripts. RESULTS Readon initially splits the reference sequence into distinct active regions. It employs a sliding window approach within each region, calculates minimizers, and constructs the specialized structured arrays for query indexing. Following initial alignment anchor screening of candidate read-through transcripts, further confirmation steps are executed. Comparative assessments against existing software reveal Readon's superior performance on both simulated and validated real data. Additionally, two downstream tools are provided: one for predicting whether a read-through transcript is likely to undergo nonsense-mediated decay or encodes a protein, and another for visualizing splicing patterns. AVAILABILITY AND IMPLEMENTATION Readon is freely available on GitHub (https://github.com/Bulabula45/Readon).
Collapse
Affiliation(s)
- Siang Chen
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Wang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongdong Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Runsheng Chen
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianjun Luo
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
7
|
Gupta P, O’Neill H, Wolvetang E, Chatterjee A, Gupta I. Advances in single-cell long-read sequencing technologies. NAR Genom Bioinform 2024; 6:lqae047. [PMID: 38774511 PMCID: PMC11106032 DOI: 10.1093/nargab/lqae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/18/2024] [Accepted: 04/29/2024] [Indexed: 05/24/2024] Open
Abstract
With an increase in accuracy and throughput of long-read sequencing technologies, they are rapidly being assimilated into the single-cell sequencing pipelines. For transcriptome sequencing, these techniques provide RNA isoform-level information in addition to the gene expression profiles. Long-read sequencing technologies not only help in uncovering complex patterns of cell-type specific splicing, but also offer unprecedented insights into the origin of cellular complexity and thus potentially new avenues for drug development. Additionally, single-cell long-read DNA sequencing enables high-quality assemblies, structural variant detection, haplotype phasing, resolving high-complexity regions, and characterization of epigenetic modifications. Given that significant progress has primarily occurred in single-cell RNA isoform sequencing (scRiso-seq), this review will delve into these advancements in depth and highlight the practical considerations and operational challenges, particularly pertaining to downstream analysis. We also aim to offer a concise introduction to complementary technologies for single-cell sequencing of the genome, epigenome and epitranscriptome. We conclude by identifying certain key areas of innovation that may drive these technologies further and foster more widespread application in biomedical science.
Collapse
Affiliation(s)
- Pallavi Gupta
- University of Queensland – IIT Delhi Research Academy, Hauz Khas, New Delhi 110016, India
- Australian Institute of Bioengineering and Nanotechnology (AIBN), The University of Queensland, St Lucia, QLD 4072, Australia
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| | - Hannah O’Neill
- Department of Pathology, Dunedin School of Medicine, University of Otago, 58 Hanover Street, Dunedin 9054, New Zealand
| | - Ernst J Wolvetang
- Australian Institute of Bioengineering and Nanotechnology (AIBN), The University of Queensland, St Lucia, QLD 4072, Australia
| | - Aniruddha Chatterjee
- Department of Pathology, Dunedin School of Medicine, University of Otago, 58 Hanover Street, Dunedin 9054, New Zealand
| | - Ishaan Gupta
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
8
|
Wang W, Li Y, Ko S, Feng N, Zhang M, Liu JJ, Zheng S, Ren B, Yu YP, Luo JH, Tseng GC, Liu S. IFDlong: an isoform and fusion detector for accurate annotation and quantification of long-read RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.11.593690. [PMID: 38798496 PMCID: PMC11118288 DOI: 10.1101/2024.05.11.593690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Advancements in long-read transcriptome sequencing (long-RNA-seq) technology have revolutionized the study of isoform diversity. These full-length transcripts enhance the detection of various transcriptome structural variations, including novel isoforms, alternative splicing events, and fusion transcripts. By shifting the open reading frame or altering gene expressions, studies have proved that these transcript alterations can serve as crucial biomarkers for disease diagnosis and therapeutic targets. In this project, we proposed IFDlong, a bioinformatics and biostatistics tool to detect isoform and fusion transcripts using bulk or single-cell long-RNA-seq data. Specifically, the software performed gene and isoform annotation for each long-read, defined novel isoforms, quantified isoform expression by a novel expectation-maximization algorithm, and profiled the fusion transcripts. For evaluation, IFDlong pipeline achieved overall the best performance when compared with several existing tools in large-scale simulation studies. In both isoform and fusion transcript quantification, IFDlong is able to reach more than 0.8 Spearman's correlation with the truth, and more than 0.9 cosine similarity when distinguishing multiple alternative splicing events. In novel isoform simulation, IFDlong can successfully balance the sensitivity (higher than 90%) and specificity (higher than 90%). Furthermore, IFDlong has proved its accuracy and robustness in diverse in-house and public datasets on healthy tissues, cell lines and multiple types of diseases. Besides bulk long-RNA-seq, IFDlong pipeline has proved its compatibility to single-cell long-RNA-seq data. This new software may hold promise for significant impact on long-read transcriptome analysis. The IFDlong software is available at https://github.com/wenjiaking/IFDlong.
Collapse
Affiliation(s)
- Wenjia Wang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Yuzhen Li
- Department of Surgery, School of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - Sungjin Ko
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
| | - Ning Feng
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - Manling Zhang
- Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA
| | - Jia-Jun Liu
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
| | - Songyang Zheng
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
| | - Baoguo Ren
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
| | - Yan P. Yu
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
| | - Jian-Hua Luo
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
- Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA
| | - George C. Tseng
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Silvia Liu
- Department of Pathology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
- Pittsburgh Liver Research Center, University of Pittsburgh, Pittsburgh, PA
- Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA
| |
Collapse
|
9
|
Xu SM, Cheng Y, Fisher H, Janitz M. Recent advances in the investigation of fusion RNAs and their role in molecular pathology of cancer. Int J Biochem Cell Biol 2024; 168:106529. [PMID: 38246262 DOI: 10.1016/j.biocel.2024.106529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Revised: 01/16/2024] [Accepted: 01/17/2024] [Indexed: 01/23/2024]
Abstract
Gene fusions have had a significant role in the development of various types of cancer, oftentimes involved in oncogenic activities through dysregulation of gene expression or signalling pathways. Some cancer-associated chromosomal translocations can undergo backsplicing, resulting in fusion-circular RNAs, a more stable isoform immune to RNase degradation. This stability makes fusion circular RNAs a promising diagnostic biomarker for cancer. While the detection of linear fusion RNAs and their function in certain cancers have been described in literature, fusion circular RNAs lag behind due to their low abundance in cancer cells. This review highlights current literature on the role of linear and circular fusion transcripts in cancer, tools currently available for detecting of these chimeric RNAs and their function and how they play a role in tumorigenesis.
Collapse
Affiliation(s)
- Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Harry Fisher
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
| |
Collapse
|
10
|
Qin Q, Popic V, Yu H, White E, Khorgade A, Shin A, Wienand K, Dondi A, Beerenwinkel N, Vazquez F, Al’Khafaji AM, Haas BJ. CTAT-LR-fusion: accurate fusion transcript identification from long and short read isoform sequencing at bulk or single cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.24.581862. [PMID: 38464114 PMCID: PMC10925146 DOI: 10.1101/2024.02.24.581862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.
Collapse
Affiliation(s)
- Qian Qin
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Victoria Popic
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Houlin Yu
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Emily White
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Akanksha Khorgade
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Asa Shin
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Kirsty Wienand
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Arthur Dondi
- ETH Zurich, Department of Biosystems Science and Engineering, Schanzenstrasse 44, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Schanzenstrasse 44, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Schanzenstrasse 44, 4056 Basel, Switzerland
| | - Francisca Vazquez
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Aziz M. Al’Khafaji
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| | - Brian J. Haas
- Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142 USA
| |
Collapse
|
11
|
Thomson AJ, Rehn JA, Heatley SL, Eadie LN, Page EC, Schutz C, McClure BJ, Sutton R, Dalla-Pozza L, Moore AS, Greenwood M, Kotecha RS, Fong CY, Yong ASM, Yeung DT, Breen J, White DL. Reproducible Bioinformatics Analysis Workflows for Detecting IGH Gene Fusions in B-Cell Acute Lymphoblastic Leukaemia Patients. Cancers (Basel) 2023; 15:4731. [PMID: 37835427 PMCID: PMC10571859 DOI: 10.3390/cancers15194731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 09/22/2023] [Indexed: 10/15/2023] Open
Abstract
B-cell acute lymphoblastic leukaemia (B-ALL) is characterised by diverse genomic alterations, the most frequent being gene fusions detected via transcriptomic analysis (mRNA-seq). Due to its hypervariable nature, gene fusions involving the Immunoglobulin Heavy Chain (IGH) locus can be difficult to detect with standard gene fusion calling algorithms and significant computational resources and analysis times are required. We aimed to optimize a gene fusion calling workflow to achieve best-case sensitivity for IGH gene fusion detection. Using Nextflow, we developed a simplified workflow containing the algorithms FusionCatcher, Arriba, and STAR-Fusion. We analysed samples from 35 patients harbouring IGH fusions (IGH::CRLF2 n = 17, IGH::DUX4 n = 15, IGH::EPOR n = 3) and assessed the detection rates for each caller, before optimizing the parameters to enhance sensitivity for IGH fusions. Initial results showed that FusionCatcher and Arriba outperformed STAR-Fusion (85-89% vs. 29% of IGH fusions reported). We found that extensive filtering in STAR-Fusion hindered IGH reporting. By adjusting specific filtering steps (e.g., read support, fusion fragments per million total reads), we achieved a 94% reporting rate for IGH fusions with STAR-Fusion. This analysis highlights the importance of filtering optimization for IGH gene fusion events, offering alternative workflows for difficult-to-detect high-risk B-ALL subtypes.
Collapse
Affiliation(s)
- Ashlee J. Thomson
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
| | - Jacqueline A. Rehn
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
| | - Susan L. Heatley
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
- Australian and New Zealand Children’s Oncology Group (ANZCHOG), Clayton, VIC 3168, Australia
| | - Laura N. Eadie
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
| | - Elyse C. Page
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
| | - Caitlin Schutz
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
| | - Barbara J. McClure
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
| | - Rosemary Sutton
- Molecular Diagnostics, Children’s Cancer Institute, Kensington, NSW 2750, Australia;
| | - Luciano Dalla-Pozza
- The Cancer Centre for Children, The Children’s Hospital at Westmead, Westmead, NSW 2145, Australia;
| | - Andrew S. Moore
- Oncology Service, Children’s Health Queensland Hospital and Health Service, Brisbane, QLD 4101, Australia;
- Child Health Research Centre, The University of Queensland, Brisbane, QLD 4000, Australia
| | - Matthew Greenwood
- Department of Haematology and Transfusion Services, Royal North Shore Hospital, Sydney, NSW 2065, Australia;
- Faculty of Health and Medicine, University of Sydney, Sydney, NSW 2006, Australia
| | - Rishi S. Kotecha
- Department of Clinical Haematology, Oncology, Blood and Marrow Transplantation, Perth Children’s Hospital, Perth, WA 6009, Australia;
- Leukaemia Translational Research Laboratory, Telethon Kids Cancer Centre, Telethon Kids Institute, University of Western Australia, Perth, WA 6009, Australia
- Curtin Medical School, Curtin University, Perth, WA 6845, Australia
| | - Chun Y. Fong
- Department of Clinical Haematology, Austin Health, Heidelberg, VIC 3083, Australia;
| | - Agnes S. M. Yong
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia
- Division of Pathology & Laboratory, University of Western Australia Medical School, Perth, WA 6009, Australia
- Department of Haematology, Royal Perth Hospital, Perth, WA 6000, Australia
| | - David T. Yeung
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
- Haematology Department, Royal Adelaide Hospital and SA Pathology, Adelaide, SA 5000, Australia
| | - James Breen
- Black Ochre Data Labs, Indigenous Genomics, Telethon Kids Institute, Adelaide, SA 5000, Australia
- James Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Deborah L. White
- Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; (J.A.R.); (S.L.H.); (L.N.E.); (E.C.P.); (B.J.M.); (A.S.M.Y.); (D.T.Y.); (D.L.W.)
- Blood Cancer Program, Precision Cancer Medicine Theme, South Australian Health & Medical Research Institute (SAHMRI), Adelaide, SA 5000, Australia;
- Australian and New Zealand Children’s Oncology Group (ANZCHOG), Clayton, VIC 3168, Australia
- Australian Genomics Health Alliance (AGHA), The Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| |
Collapse
|
12
|
Xia Y, Jin Z, Zhang C, Ouyang L, Dong Y, Li J, Guo L, Jing B, Shi Y, Miao S, Xi R. TAGET: a toolkit for analyzing full-length transcripts from long-read sequencing. Nat Commun 2023; 14:5935. [PMID: 37741817 PMCID: PMC10518008 DOI: 10.1038/s41467-023-41649-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 09/13/2023] [Indexed: 09/25/2023] Open
Abstract
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies.
Collapse
Affiliation(s)
- Yuchao Xia
- College of Science, Beijing Information Science and Technology University, 100192, Beijing, China
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Zijie Jin
- Peking University International Cancer Institute, Health Science Center, Peking University, 100191, Beijing, China
- School of Mathematical Sciences, Peking University, 100871, Beijing, China
| | | | - Linkun Ouyang
- Academy for Advanced Interdisciplinary Studies, Peking University, 100871, Beijing, China
| | - Yuhao Dong
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Juan Li
- Department of Biomedical Engineering, College of Future Technology, Peking University, 100871, Beijing, China
| | - Lvze Guo
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Biyang Jing
- Beijing GeneX Health Co.,Ltd, 100195, Beijing, China
| | - Yang Shi
- BeiGene (Beijing) Co., Ltd., Beijing, China
| | - Susheng Miao
- Department of Head and Neck Surgery, Harbin Medical University Cancer Hospital, 150081, Harbin, China.
| | - Ruibin Xi
- School of Mathematical Sciences, Peking University, 100871, Beijing, China.
- Academy for Advanced Interdisciplinary Studies, Peking University, 100871, Beijing, China.
- Center for Statistical Science, Peking University, 100871, Beijing, China.
| |
Collapse
|
13
|
Oehler JB, Wright H, Stark Z, Mallett AJ, Schmitz U. The application of long-read sequencing in clinical settings. Hum Genomics 2023; 17:73. [PMID: 37553611 PMCID: PMC10410870 DOI: 10.1186/s40246-023-00522-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/10/2023] Open
Abstract
Long-read DNA sequencing technologies have been rapidly evolving in recent years, and their ability to assess large and complex regions of the genome makes them ideal for clinical applications in molecular diagnosis and therapy selection, thereby providing a valuable tool for precision medicine. In the third-generation sequencing duopoly, Oxford Nanopore Technologies and Pacific Biosciences work towards increasing the accuracy, throughput, and portability of long-read sequencing methods while trying to keep costs low. These trades have made long-read sequencing an attractive tool for use in research and clinical settings. This article provides an overview of current clinical applications and limitations of long-read sequencing and explores its potential for point-of-care testing and health care in remote settings.
Collapse
Affiliation(s)
- Josephine B Oehler
- Biomedical Sciences and Molecular Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Townsville, Australia
- College of Medicine and Dentistry, James Cook University, Townsville, Australia
| | - Helen Wright
- Nursing and Midwifery, College of Healthcare Sciences, James Cook University, Townsville, Australia
| | - Zornitza Stark
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Melbourne, Australia
- University of Melbourne, Melbourne, Australia
- Australian Genomics, Melbourne, Australia
| | - Andrew J Mallett
- College of Medicine and Dentistry, James Cook University, Townsville, Australia
- Department of Renal Medicine, Townsville University Hospital, Townsville, Australia
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
- Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Ulf Schmitz
- Biomedical Sciences and Molecular Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Townsville, Australia.
- Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns, Australia.
- Computational BioMedicine Lab Centenary Institute, The University of Sydney, Camperdown, Australia.
- Faculty of Medicine & Health, The University of Sydney, Camperdown, Australia.
| |
Collapse
|
14
|
You Y, Prawer YDJ, De Paoli-Iseppi R, Hunt CPJ, Parish CL, Shim H, Clark MB. Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE. Genome Biol 2023; 24:66. [PMID: 37024980 PMCID: PMC10077662 DOI: 10.1186/s13059-023-02907-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 03/23/2023] [Indexed: 04/08/2023] Open
Abstract
Long-read single-cell RNA sequencing (scRNA-seq) enables the quantification of RNA isoforms in individual cells. However, long-read scRNA-seq using the Oxford Nanopore platform has largely relied upon matched short-read data to identify cell barcodes. We introduce BLAZE, which accurately and efficiently identifies 10x cell barcodes using only nanopore long-read scRNA-seq data. BLAZE outperforms the existing tools and provides an accurate representation of the cells present in long-read scRNA-seq when compared to matched short reads. BLAZE simplifies long-read scRNA-seq while improving the results, is compatible with downstream tools accepting a cell barcode file, and is available at https://github.com/shimlab/BLAZE .
Collapse
Affiliation(s)
- Yupei You
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Yair D J Prawer
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ricardo De Paoli-Iseppi
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Cameron P J Hunt
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Clare L Parish
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Heejung Shim
- School of Mathematics and Statistics/Melbourne Integrative Genomics, The University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, 3010, Australia.
| |
Collapse
|
15
|
Wu S, Schmitz U. Single-cell and long-read sequencing to enhance modelling of splicing and cell-fate determination. Comput Struct Biotechnol J 2023; 21:2373-2380. [PMID: 37066125 PMCID: PMC10091034 DOI: 10.1016/j.csbj.2023.03.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 03/13/2023] [Accepted: 03/13/2023] [Indexed: 04/03/2023] Open
Abstract
Single-cell sequencing technologies have revolutionised the life sciences and biomedical research. Single-cell sequencing provides high-resolution data on cell heterogeneity, allowing high-fidelity cell type identification, and lineage tracking. Computational algorithms and mathematical models have been developed to make sense of the data, compensate for errors and simulate the biological processes, which has led to breakthroughs in our understanding of cell differentiation, cell-fate determination and tissue cell composition. The development of long-read (a.k.a. third-generation) sequencing technologies has produced powerful tools for investigating alternative splicing, isoform expression (at the RNA level), genome assembly and the detection of complex structural variants (at the DNA level). In this review, we provide an overview of the recent advancements in single-cell and long-read sequencing technologies, with a particular focus on the computational algorithms that help in correcting, analysing, and interpreting the resulting data. Additionally, we review some mathematical models that use single-cell and long-read sequencing data to study cell-fate determination and alternative splicing, respectively. Moreover, we highlight the emerging opportunities in modelling cell-fate determination that result from the combination of single-cell and long-read sequencing technologies.
Collapse
Affiliation(s)
- Siyuan Wu
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
- School of Mathematics, Monash University, Melbourne 3800, Victoria, Australia
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville 4811, Queensland, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Cairns 4870, Queensland, Australia
| |
Collapse
|
16
|
Transcriptome profiling for precision cancer medicine using shallow nanopore cDNA sequencing. Sci Rep 2023; 13:2378. [PMID: 36759549 PMCID: PMC9911782 DOI: 10.1038/s41598-023-29550-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 02/06/2023] [Indexed: 02/11/2023] Open
Abstract
Transcriptome profiling is a mainstay of translational cancer research and is increasingly finding its way into precision oncology. While bulk RNA sequencing (RNA-seq) is widely available, high investment costs and long data return time are limiting factors for clinical applications. We investigated a portable nanopore long-read sequencing device (MinION, Oxford Nanopore Technologies) for transcriptome profiling of tumors. In particular, we investigated the impact of lower coverage than that of larger sequencing devices by comparing shallow nanopore RNA-seq data with short-read RNA-seq data generated using reversible dye terminator technology (Illumina) for ten samples representing four cancer types. Coupled with ShaNTi (Shallow Nanopore sequencing for Transcriptomics), a newly developed data processing pipeline, a turnaround time of five days was achieved. The correlation of normalized gene-level counts between nanopore and Illumina RNA-seq was high for MinION but not for very low-throughput Flongle flow cells (r = 0.89 and r = 0.24, respectively). A cost-saving approach based on multiplexing of four samples per MinION flow cell maintained a high correlation with Illumina data (r = 0.56-0.86). In addition, we compared the utility of nanopore and Illumina RNA-seq data for analysis tools commonly applied in translational oncology: (1) Shallow nanopore and Illumina RNA-seq were equally useful for inferring signaling pathway activities with PROGENy. (2) Highly expressed genes encoding kinases targeted by clinically approved small-molecule inhibitors were reliably identified by shallow nanopore RNA-seq. (3) In tumor microenvironment composition analysis, quanTIseq performed better than CIBERSORT, likely due to higher average expression of the gene set used for deconvolution. (4) Shallow nanopore RNA-seq was successfully applied to detect fusion genes using the JAFFAL pipeline. These findings suggest that shallow nanopore RNA-seq enables rapid and biologically meaningful transcriptome profiling of tumors, and warrants further exploration in precision cancer medicine studies.
Collapse
|
17
|
Chen Y, Wang Y, Chen W, Tan Z, Song Y, Chen H, Chong Z. Gene Fusion Detection and Characterization in Long-Read Cancer Transcriptome Sequencing Data with FusionSeeker. Cancer Res 2023; 83:28-33. [PMID: 36318117 PMCID: PMC9812290 DOI: 10.1158/0008-5472.can-22-1628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 09/07/2022] [Accepted: 10/28/2022] [Indexed: 01/05/2023]
Abstract
Gene fusions are prevalent in a wide array of cancer types with different frequencies. Long-read transcriptome sequencing technologies, such as PacBio, Iso-Seq, and Nanopore direct RNA sequencing, provide full-length transcript sequencing reads, which could facilitate detection of gene fusions. In this work, we developed a method, FusionSeeker, to comprehensively characterize gene fusions in long-read cancer transcriptome data and reconstruct accurate fused transcripts from raw reads. FusionSeeker identified gene fusions in both exonic and intronic regions, allowing comprehensive characterization of gene fusions in cancer transcriptomes. Fused transcript sequences were reconstructed with FusionSeeker by correcting sequencing errors in the raw reads through partial order alignment algorithm. Using these accurate transcript sequences, FusionSeeker refined gene fusion breakpoint positions and predicted breakpoints at single bp resolution. Overall, FusionSeeker will enable users to discover gene fusions accurately using long-read data, which can facilitate downstream functional analysis as well as improved cancer diagnosis and treatment. SIGNIFICANCE FusionSeeker is a new method to discover gene fusions and reconstruct fused transcript sequences in long-read cancer transcriptome sequencing data to help identify novel gene fusions important for tumorigenesis and progression.
Collapse
Affiliation(s)
- Yu Chen
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | - Yiqing Wang
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, AL 35294, USA
| | - Weisheng Chen
- Department of Surgery, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | - Zhengzhi Tan
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, AL 35294, USA
| | - Yuwei Song
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA
| | | | - Herbert Chen
- Department of Surgery, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Department of Biomedical Engineering, School of Engineering, University of Alabama at Birmingham, AL 35294, USA
| | - Zechen Chong
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, AL 35294, USA.,Correspondence author: Zechen Chong, Ph.D., Mailing address: THT 134, 1720 2 Ave S, Birmingham AL 35226, Phone: 205-801-7590,
| |
Collapse
|
18
|
Dorney R, Dhungel BP, Rasko JEJ, Hebbard L, Schmitz U. Recent advances in cancer fusion transcript detection. Brief Bioinform 2022; 24:6918739. [PMID: 36527429 PMCID: PMC9851307 DOI: 10.1093/bib/bbac519] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/11/2022] [Accepted: 10/31/2022] [Indexed: 12/23/2022] Open
Abstract
Extensive investigation of gene fusions in cancer has led to the discovery of novel biomarkers and therapeutic targets. To date, most studies have neglected chromosomal rearrangement-independent fusion transcripts and complex fusion structures such as double or triple-hop fusions, and fusion-circRNAs. In this review, we untangle fusion-related terminology and propose a classification system involving both gene and transcript fusions. We highlight the importance of RNA-level fusions and how long-read sequencing approaches can improve detection and characterization. Moreover, we discuss novel bioinformatic tools to identify fusions in long-read sequencing data and strategies to experimentally validate and functionally characterize fusion transcripts.
Collapse
Affiliation(s)
- Ryley Dorney
- epartment of Molecular & Cell Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD 4811, Australia,Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - Bijay P Dhungel
- Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia,Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW 2006, Australia,Centre for Tropical Bioinformatics and Molecular Biology, Australian Institute of Tropical Health and Medicine, James Cook University, Cairns 4878, Australia
| | - John E J Rasko
- Gene and Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia,Faculty of Medicine & Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Lionel Hebbard
- epartment of Molecular & Cell Biology, College of Public Health, Medical & Vet Sciences, James Cook University, Douglas, QLD 4811, Australia,Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Sydney, New South Wales, Australia
| | - Ulf Schmitz
- Corresponding author. Ulf Schmitz, Department of Molecular and Cell Biology, College of Public Health, Medical and Vet Sciences, James Cook University, Douglas, QLD 4811, Australia. E-mail:
| |
Collapse
|
19
|
PANAGOPOULOS IOANNIS, HEIM SVERRE. Neoplasia-associated Chromosome Translocations Resulting in Gene Truncation. Cancer Genomics Proteomics 2022; 19:647-672. [PMID: 36316036 PMCID: PMC9620447 DOI: 10.21873/cgp.20349] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/19/2022] [Accepted: 08/23/2022] [Indexed: 11/27/2022] Open
Abstract
Chromosomal translocations in cancer as well as benign neoplasias typically lead to the formation of fusion genes. Such genes may encode chimeric proteins when two protein-coding regions fuse in-frame, or they may result in deregulation of genes via promoter swapping or translocation of the gene into the vicinity of a highly active regulatory element. A less studied consequence of chromosomal translocations is the fusion of two breakpoint genes resulting in an out-of-frame chimera. The breaks then occur in one or both protein-coding regions forming a stop codon in the chimeric transcript shortly after the fusion point. Though the latter genetic events and mechanisms at first awoke little research interest, careful investigations have established them as neither rare nor inconsequential. In the present work, we review and discuss the truncation of genes in neoplastic cells resulting from chromosomal rearrangements, especially from seemingly balanced translocations.
Collapse
Affiliation(s)
- IOANNIS PANAGOPOULOS
- Section for Cancer Cytogenetics, Institute for Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - SVERRE HEIM
- Section for Cancer Cytogenetics, Institute for Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway,Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
20
|
Kiyose H, Nakagawa H, Ono A, Aikata H, Ueno M, Hayami S, Yamaue H, Chayama K, Shimada M, Wong JH, Fujimoto A. Comprehensive analysis of full-length transcripts reveals novel splicing abnormalities and oncogenic transcripts in liver cancer. PLoS Genet 2022; 18:e1010342. [PMID: 35926060 PMCID: PMC9380957 DOI: 10.1371/journal.pgen.1010342] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/16/2022] [Accepted: 07/14/2022] [Indexed: 12/24/2022] Open
Abstract
Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. Although long-reads sequencing technologies would enable the sequencing of full-length transcripts, the data analysis is difficult. In this study, we developed an analysis pipeline named SPLICE and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC) and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 46,663 transcripts from the protein-coding genes in the HCCs and the matched non-cancerous livers, of which 5,366 (11.5%) were novel. A comparison of expression levels identified 9,933 differentially expressed transcripts (DETs) in 4,744 genes. Interestingly, 746 genes with DETs, including the LINE1-MET transcript, were not found by a gene-level analysis. We also found that fusion transcripts of transposable elements and hepatitis B virus (HBV) were overexpressed in HCCs. In vitro experiments on DETs showed that LINE1-MET and HBV-human transposable elements promoted cell growth. Furthermore, fusion gene detection showed novel recurrent fusion events that were not detected in the short-reads. These results suggest the efficiency of full-length transcriptome studies and the importance of splicing variants in carcinogenesis. Genes generate transcripts of various functions by alternative splicing. However, in most transcriptome studies, short-reads sequencing technologies (next-generation sequencers) have been used, leaving full-length transcripts unobserved directly. In this study, we developed an analysis pipeline named SPLICE for long-read transcriptome sequencing and analyzed cDNA sequences from 42 pairs of hepatocellular carcinoma (HCC), and matched non-cancerous livers with an Oxford Nanopore sequencer. Our analysis detected 5,366 novel transcripts and 9,933 differentially expressed transcripts in 4,744 genes between HCCs and non-cancerous livers. An analysis of hepatitis B virus (HBV) transcripts showed that fusion transcripts of the HBV gene and human transposable elements were overexpressed in HBV-infected HCCs. We also identified fusion genes that were not found in the short-reads. These results suggest that long-reads sequencing technologies provide a fuller understanding of cancer transcripts and that our method contributes to the analysis of transcriptome sequences by such technologies.
Collapse
Affiliation(s)
- Hiroki Kiyose
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Hidewaki Nakagawa
- Laboratory for Cancer Genomics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Atsushi Ono
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Hiroshi Aikata
- Department of Gastroenterology and Metabolism, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
| | - Masaki Ueno
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Shinya Hayami
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Hiroki Yamaue
- Department of Surgery II, Wakayama Medical University, Wakayama, Japan
| | - Kazuaki Chayama
- Collaborative Research Laboratory of Medical Innovation, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- Research Center for Hepatology and Gastroenterology, Graduate School of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Mihoko Shimada
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Jing Hao Wong
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|