1
|
Eralp B, Sefer E. Reference-free inferring of transcriptomic events in cancer cells on single-cell data. BMC Cancer 2024; 24:607. [PMID: 38769480 PMCID: PMC11107047 DOI: 10.1186/s12885-024-12331-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND Cancerous cells' identity is determined via a mixture of multiple factors such as genomic variations, epigenetics, and the regulatory variations that are involved in transcription. The differences in transcriptome expression as well as abnormal structures in peptides determine phenotypical differences. Thus, bulk RNA-seq and more recent single-cell RNA-seq data (scRNA-seq) are important to identify pathogenic differences. In this case, we rely on k-mer decomposition of sequences to identify pathogenic variations in detail which does not need a reference, so it outperforms more traditional Next-Generation Sequencing (NGS) analysis techniques depending on the alignment of the sequences to a reference. RESULTS Via our alignment-free analysis, over esophageal and glioblastoma cancer patients, high-frequency variations over multiple different locations (repeats, intergenic regions, exons, introns) as well as multiple different forms (fusion, polyadenylation, splicing, etc.) could be discovered. Additionally, we have analyzed the importance of less-focused events systematically in a classic transcriptome analysis pipeline where these events are considered as indicators for tumor prognosis, tumor prediction, tumor neoantigen inference, as well as their connection with respect to the immune microenvironment. CONCLUSIONS Our results suggest that esophageal cancer (ESCA) and glioblastoma processes can be explained via pathogenic microbial RNA, repeated sequences, novel splicing variants, and long intergenic non-coding RNAs (lincRNAs). We expect our application of reference-free process and analysis to be helpful in tumor and normal samples differential scRNA-seq analysis, which in turn offers a more comprehensive scheme for major cancer-associated events.
Collapse
Affiliation(s)
- Batuhan Eralp
- Department of Computer Science, Ozyegin University, Istanbul, Turkey
| | - Emre Sefer
- Department of Computer Science, Ozyegin University, Istanbul, Turkey.
| |
Collapse
|
2
|
Liehrmann A, Delannoy E, Launay-Avon A, Gilbault E, Loudet O, Castandet B, Rigaill G. DiffSegR: an RNA-seq data driven method for differential expression analysis using changepoint detection. NAR Genom Bioinform 2023; 5:lqad098. [PMID: 37954572 PMCID: PMC10632193 DOI: 10.1093/nargab/lqad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/27/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
To fully understand gene regulation, it is necessary to have a thorough understanding of both the transcriptome and the enzymatic and RNA-binding activities that shape it. While many RNA-Seq-based tools have been developed to analyze the transcriptome, most only consider the abundance of sequencing reads along annotated patterns (such as genes). These annotations are typically incomplete, leading to errors in the differential expression analysis. To address this issue, we present DiffSegR - an R package that enables the discovery of transcriptome-wide expression differences between two biological conditions using RNA-Seq data. DiffSegR does not require prior annotation and uses a multiple changepoints detection algorithm to identify the boundaries of differentially expressed regions in the per-base log2 fold change. In a few minutes of computation, DiffSegR could rightfully predict the role of chloroplast ribonuclease Mini-III in rRNA maturation and chloroplast ribonuclease PNPase in (3'/5')-degradation of rRNA, mRNA and tRNA precursors as well as intron accumulation. We believe DiffSegR will benefit biologists working on transcriptomics as it allows access to information from a layer of the transcriptome overlooked by the classical differential expression analysis pipelines widely used today. DiffSegR is available at https://aliehrmann.github.io/DiffSegR/index.html.
Collapse
Affiliation(s)
- Arnaud Liehrmann
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
- Laboratoire de Mathématiques et de Modélisation d’Evry (LaMME), Université d’Evry-Val-d’Essonne, UMR CNRS 8071, ENSIIE, USC INRAE, Evry,91037, France
| | - Etienne Delannoy
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Alexandra Launay-Avon
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Elodie Gilbault
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000, Versailles, France
| | - Olivier Loudet
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000, Versailles, France
| | - Benoît Castandet
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Guillem Rigaill
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
- Laboratoire de Mathématiques et de Modélisation d’Evry (LaMME), Université d’Evry-Val-d’Essonne, UMR CNRS 8071, ENSIIE, USC INRAE, Evry,91037, France
| |
Collapse
|
3
|
Sonti S, Littleton SH, Pahl MC, Zimmerman AJ, Chesi A, Palermo J, Lasconi C, Brown EB, Pippin JA, Wells AD, Doldur-Balli F, Pack AI, Gehrman PR, Keene AC, Grant SFA. Perturbation of the insomnia WDR90 GWAS locus pinpoints rs3752495 as a causal variant influencing distal expression of neighboring gene, PIG-Q. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.17.553739. [PMID: 37645863 PMCID: PMC10462147 DOI: 10.1101/2023.08.17.553739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Although genome wide association studies (GWAS) have been crucial for the identification of loci associated with sleep traits and disorders, the method itself does not directly uncover the underlying causal variants and corresponding effector genes. The overwhelming majority of such variants reside in non-coding regions and are therefore presumed to impact the activity of cis-regulatory elements, such as enhancers. Our previously reported 'variant-to-gene mapping' effort in human induced pluripotent stem cell (iPSC)-derived neural progenitor cells (NPCs), combined with validation in both Drosophila and zebrafish, implicated PIG-Q as a functionally relevant gene at the insomnia 'WDR90' locus. However, importantly that effort did not characterize the corresponding underlying causal variant at this GWAS signal. Specifically, our genome-wide ATAC-seq and high-resolution promoter-focused Capture C datasets generated in this cell setting brought our attention to a shortlist of three tightly neighboring single nucleotide polymorphisms (SNPs) in strong linkage disequilibrium in a candidate intronic enhancer region of WDR90 that contacted the open PIG-Q promoter. The objective of this study was to investigate the influence of the proxy SNPs collectively and then individually on PIG-Q modulation and to pinpoint the causal "regulatory" variant among the three SNPs. Starting at a gross level perturbation, deletion of the entire region harboring all three SNPs in human iPSC-derived neural progenitor cells via CRISPR-Cas9 editing and subsequent RNA sequencing revealed expression changes in specific PIG-Q transcripts. Results from more refined individual luciferase reporter assays for each of the three SNPs in iPSCs revealed that the intronic region with the rs3752495 risk allele induced a ~2.5-fold increase in luciferase expression (n=10). Importantly, rs3752495 also exhibited an allele specific effect, with the risk allele increasing the luciferase expression by ~2-fold compared to the non-risk allele. In conclusion, our variant-to-function approach and subsequent in vitro validation implicates rs3752495 as a causal insomnia risk variant embedded at the WDR90-PIG-Q locus.
Collapse
Affiliation(s)
- Shilpa Sonti
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Sheridan H Littleton
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Matthew C Pahl
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Amber J Zimmerman
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Division of Sleep Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Alessandra Chesi
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine University of Pennsylvania Perelman School of Medicine Philadelphia PA USA
| | - Justin Palermo
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Chiara Lasconi
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Elizabeth B Brown
- Division of Sleep Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - James A Pippin
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Andrew D Wells
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Fusun Doldur-Balli
- Division of Sleep Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Allan I Pack
- Division of Sleep Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Phillip R Gehrman
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Alex C Keene
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - S F A Grant
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Pediatrics, The University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA
- Divisions of Human Genetics and Endocrinology & Diabetes, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| |
Collapse
|
4
|
Salz R, Saraiva-Agostinho N, Vorsteveld E, van der Made CI, Kersten S, Stemerdink M, Allen J, Volders PJ, Hunt SE, Hoischen A, 't Hoen PAC. SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation. BMC Genomics 2023; 24:305. [PMID: 37280537 PMCID: PMC10245480 DOI: 10.1186/s12864-023-09391-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 05/19/2023] [Indexed: 06/08/2023] Open
Abstract
Our incomplete knowledge of the human transcriptome impairs the detection of disease-causing variants, in particular if they affect transcripts only expressed under certain conditions. These transcripts are often lacking from reference transcript sets, such as Ensembl/GENCODE and RefSeq, and could be relevant for establishing genetic diagnoses. We present SUsPECT (Solving Unsolved Patient Exomes/gEnomes using Custom Transcriptomes), a pipeline based on the Ensembl Variant Effect Predictor (VEP) to predict variant impact on custom transcript sets, such as those generated by long-read RNA-sequencing, for downstream prioritization. Our pipeline predicts the functional consequence and likely deleteriousness scores for missense variants in the context of novel open reading frames predicted from any transcriptome. We demonstrate the utility of SUsPECT by uncovering potential mutational mechanisms of pathogenic variants in ClinVar that are not predicted to be pathogenic using the reference transcript annotation. In further support of SUsPECT's utility, we identified an enrichment of immune-related variants predicted to have a more severe molecular consequence when annotating with a newly generated transcriptome from stimulated immune cells instead of the reference transcriptome. Our pipeline outputs crucial information for further prioritization of potentially disease-causing variants for any disease and will become increasingly useful as more long-read RNA sequencing datasets become available.
Collapse
Affiliation(s)
- Renee Salz
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Emil Vorsteveld
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
| | - Caspar I van der Made
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
- Department of Internal Medicine, Radboud Institute for Molecular Life Sciences, and Radboud Expertise Center for Immunodeficiency and Autoinflammation, Radboud University Medical Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
| | - Simone Kersten
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
| | - Merel Stemerdink
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pieter-Jan Volders
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Laboratory of Molecular Diagnostics, Department of Clinical Biology, Jessa Hospital, Hasselt, 3500, Belgium
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands
- Department of Internal Medicine, Radboud Institute for Molecular Life Sciences, and Radboud Expertise Center for Immunodeficiency and Autoinflammation, Radboud University Medical Center for Infectious Diseases (RCI), Radboud University Medical Center, Nijmegen, the Netherlands
| | - Peter A C 't Hoen
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, the Netherlands.
| |
Collapse
|
5
|
Ste-Croix DT, Bélanger RR, Mimee B. Single Nematode Transcriptomic Analysis, Using Long-Read Technology, Reveals Two Novel Virulence Gene Candidates in the Soybean Cyst Nematode, Heterodera glycines. Int J Mol Sci 2023; 24:ijms24119440. [PMID: 37298400 DOI: 10.3390/ijms24119440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/18/2023] [Accepted: 05/24/2023] [Indexed: 06/12/2023] Open
Abstract
The soybean cyst nematode (Heterodera glycines, SCN), is the most damaging disease of soybean in North America. While management of this pest using resistant soybean is generally still effective, prolonged exposure to cultivars derived from the same source of resistance (PI 88788) has led to the emergence of virulence. Currently, the underlying mechanisms responsible for resistance breakdown remain unknown. In this study, we combined a single nematode transcriptomic profiling approach with long-read sequencing to reannotate the SCN genome. This resulted in the annotation of 1932 novel transcripts and 281 novel gene features. Using a transcript-level quantification approach, we identified eight novel effector candidates overexpressed in PI 88788 virulent nematodes in the late infection stage. Among these were the novel gene Hg-CPZ-1 and a pioneer effector transcript generated through the alternative splicing of the non-effector gene Hetgly21698. While our results demonstrate that alternative splicing in effectors does occur, we found limited evidence of direct involvement in the breakdown of resistance. However, our analysis highlighted a distinct pattern of effector upregulation in response to PI 88788 resistance indicative of a possible adaptation process by SCN to host resistance.
Collapse
Affiliation(s)
- Dave T Ste-Croix
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada
- Département de Phytologie, Université Laval, Québec, QC G1V 0A6, Canada
| | - Richard R Bélanger
- Département de Phytologie, Université Laval, Québec, QC G1V 0A6, Canada
- Centre de Recherche et d'Innovation sur les Végétaux (CRIV), Université Laval, Québec, QC G1V 0A6, Canada
| | - Benjamin Mimee
- Saint-Jean-sur-Richelieu Research and Development Centre, Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu, QC J3B 3E6, Canada
| |
Collapse
|
6
|
Maxeiner S, Krasteva-Christ G, Althaus M. Pitfalls of using sequence databases for heterologous expression studies - a technical review. J Physiol 2023; 601:1611-1623. [PMID: 36762618 DOI: 10.1113/jp284066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/07/2023] [Indexed: 02/11/2023] Open
Abstract
Synthesis of DNA fragments based on gene sequences that are available in public resources has become an efficient and affordable method that has gradually replaced traditional cloning efforts such as PCR cloning from cDNA. However, database entries based on genome sequencing results are prone to errors which can lead to false sequence information and, ultimately, errors in functional characterisation of proteins such as ion channels and transporters in heterologous expression systems. We have identified five common problems that repeatedly appear in public resources: (1) Not every gene has yet been annotated; (2) not all gene annotations are necessarily correct; (3) transcripts may contain automated corrections; (4) there are mismatches between gene, mRNA and protein sequences; and (5) splicing patterns often lack experimental validation. This technical review highlights and provides a strategy to bypass these issues in order to avoid critical mistakes that could impact future studies of any gene/protein of interest in heterologous expression systems.
Collapse
Affiliation(s)
- Stephan Maxeiner
- Institute for Anatomy and Cell Biology, Saarland University, Homburg, Germany
| | | | - Mike Althaus
- Department of Natural Sciences, Institute for Functional Gene Analytics, Bonn-Rhein-Sieg University of Applied Sciences, Rheinbach, Germany
| |
Collapse
|
7
|
Orabi B, Xie N, McConeghy B, Dong X, Chauve C, Hach F. Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing. Nucleic Acids Res 2022; 51:e11. [PMID: 36478271 PMCID: PMC9881145 DOI: 10.1093/nar/gkac1112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/26/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem-the Minimum Error Clustering into Isoforms (MErCi) problem-and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.
Collapse
Affiliation(s)
- Baraa Orabi
- Department of Computer Science, the University of British Columbia, Vancouver, BC, Canada
| | - Ning Xie
- Vancouver Prostate Centre, Vancouver, BC, Canada
| | | | - Xuesen Dong
- Vancouver Prostate Centre, Vancouver, BC, Canada,Department of Urologic Sciences, the University of British Columbia, Vancouver, BC, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Faraz Hach
- To whom correspondence should be addressed.
| |
Collapse
|
8
|
Nanopore microscope identifies RNA isoforms with structural colours. Nat Chem 2022; 14:1258-1264. [PMID: 36123450 DOI: 10.1038/s41557-022-01037-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 08/04/2022] [Indexed: 11/08/2022]
Abstract
Identifying RNA transcript isoforms requires intricate protocols that suffer from various enzymatic biases. Here we design three-dimensional molecular constructs that enable identification of transcript isoforms at the single-molecule level using solid-state nanopore microscopy. We refold target RNA into RNA identifiers with designed sets of complementary DNA strands. Each reshaped molecule carries a unique sequence of structural (pseudo)colours. Structural colours consist of DNA structures, protein labels, native RNA structures or a combination of all three. The sequence of structural colours of RNA identifiers enables simultaneous identification and relative quantification of multiple RNA targets without prior amplification. Our Amplification-free RNA TargEt Multiplex Isoform Sensing (ARTEMIS) method reveals structural arrangements in native transcripts in agreement with published variants. ARTEMIS discriminates circular and linear transcript isoforms in a one-step, enzyme-free reaction in a complex human transcriptome using single-molecule read-out.
Collapse
|
9
|
Wang Y, Xue H, Aglave M, Lainé A, Gallopin M, Gautheret D. The contribution of uncharted RNA sequences to tumor identity in lung adenocarcinoma. NAR Cancer 2022; 4:zcac001. [PMID: 35118386 PMCID: PMC8807116 DOI: 10.1093/narcan/zcac001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 11/18/2021] [Accepted: 01/10/2022] [Indexed: 11/12/2022] Open
Abstract
The identity of cancer cells is defined by the interplay between genetic, epigenetic transcriptional and post-transcriptional variation. A lot of this variation is present in RNA-seq data and can be captured at once using reference-free, k-mer analysis. An important issue with k-mer analysis, however, is the difficulty of distinguishing signal from noise. Here, we use two independent lung adenocarcinoma datasets to identify all reproducible events at the k-mer level, in a tumor versus normal setting. We find reproducible events in many different locations (introns, intergenic, repeats) and forms (spliced, polyadenylated, chimeric etc.). We systematically analyze events that are ignored in conventional transcriptomics and assess their value as biomarkers and for tumor classification, survival prediction, neoantigen prediction and correlation with the immune microenvironment. We find that unannotated lincRNAs, novel splice variants, endogenous HERV, Line1 and Alu repeats and bacterial RNAs each contribute to different, important aspects of tumor identity. We argue that differential RNA-seq analysis of tumor/normal sample collections would benefit from this type k-mer analysis to cast a wider net on important cancer-related events. The code is available at https://github.com/Transipedia/dekupl-lung-cancer-inter-cohort.
Collapse
Affiliation(s)
- Yunfeng Wang
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
- Annoroad Gene Technology Co., Ltd, 100176 Beijing, China
| | - Haoliang Xue
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
| | - Marine Aglave
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
- Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | - Antoine Lainé
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
| | - Mélina Gallopin
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190, Gif-sur-Yvette, France
- Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France
| |
Collapse
|
10
|
Artificial Intelligence in Blood Transcriptomics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, Imada EL, Zhang D, Joseph L, Leek JT, Jaffe AE, Nellore A, Collado-Torres L, Hansen KD, Langmead B. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol 2021; 22:323. [PMID: 34844637 PMCID: PMC8628444 DOI: 10.1186/s13059-021-02533-6] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/29/2021] [Indexed: 12/12/2022] Open
Abstract
We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Shijie C Zheng
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | | | - Rone Charles
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Thomas M. Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonathan P Ling
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - David Zhang
- Institute of Child Health, University College London (UCL), London, UK
| | | | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
- Lieber Institute for Brain Development, Baltimore, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Abhinav Nellore
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Department of Surgery, Oregon Health & Science University, Portland, OR, USA
| | | | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
12
|
Babarinde IA, Ma G, Li Y, Deng B, Luo Z, Liu H, Abdul MM, Ward C, Chen M, Fu X, Shi L, Duttlinger M, He J, Sun L, Li W, Zhuang Q, Tong G, Frampton J, Cazier JB, Chen J, Jauch R, Esteban MA, Hutchins AP. Transposable element sequence fragments incorporated into coding and noncoding transcripts modulate the transcriptome of human pluripotent stem cells. Nucleic Acids Res 2021; 49:9132-9153. [PMID: 34390351 PMCID: PMC8450112 DOI: 10.1093/nar/gkab710] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 07/29/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Transposable elements (TEs) occupy nearly 40% of mammalian genomes and, whilst most are fragmentary and no longer capable of transposition, they can nevertheless contribute to cell function. TEs within genes transcribed by RNA polymerase II can be copied as parts of primary transcripts; however, their full contribution to mature transcript sequences remains unresolved. Here, using long and short read (LR and SR) RNA sequencing data, we show that 26% of coding and 65% of noncoding transcripts in human pluripotent stem cells (hPSCs) contain TE-derived sequences. Different TE families are incorporated into RNAs in unique patterns, with consequences to transcript structure and function. The presence of TE sequences within a transcript is correlated with TE-type specific changes in its subcellular distribution, alterations in steady-state levels and half-life, and differential association with RNA Binding Proteins (RBPs). We identify hPSC-specific incorporation of endogenous retroviruses (ERVs) and LINE:L1 into protein-coding mRNAs, which generate TE sequence-derived peptides. Finally, single cell RNA-seq reveals that hPSCs express ERV-containing transcripts, whilst differentiating subpopulations lack ERVs and express SINE and LINE-containing transcripts. Overall, our comprehensive analysis demonstrates that the incorporation of TE sequences into the RNAs of hPSCs is more widespread and has a greater impact than previously appreciated.
Collapse
Affiliation(s)
- Isaac A Babarinde
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Gang Ma
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Yuhao Li
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Boping Deng
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Zhiwei Luo
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Hao Liu
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Mazid Md Abdul
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Carl Ward
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Minchun Chen
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xiuling Fu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Liyang Shi
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Martha Duttlinger
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Jiangping He
- Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
| | - Li Sun
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Wenjuan Li
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Qiang Zhuang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Guoqing Tong
- Center for Reproductive Medicine, Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200120, China
| | - Jon Frampton
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Jean-Baptiste Cazier
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham B15 2TT, UK.,Centre for Computational Biology, University of Birmingham, Birmingham, UK
| | - Jiekai Chen
- Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Center for Cell Lineage and Atlas (CCLA), Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China.,Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
| | - Ralf Jauch
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Miguel A Esteban
- Laboratory of Integrative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Key Laboratory of Regenerative Biology of the Chinese Academy of Sciences and Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China.,Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou 510005, China
| | - Andrew P Hutchins
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China.,Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| |
Collapse
|
13
|
Riquier S, Bessiere C, Guibert B, Bouge AL, Boureux A, Ruffle F, Audoux J, Gilbert N, Xue H, Gautheret D, Commes T. Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets. NAR Genom Bioinform 2021; 3:lqab058. [PMID: 34179780 PMCID: PMC8221386 DOI: 10.1093/nargab/lqab058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 05/10/2021] [Accepted: 06/17/2021] [Indexed: 11/12/2022] Open
Abstract
The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications.
Collapse
Affiliation(s)
- Sébastien Riquier
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| | - Chloé Bessiere
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| | - Benoit Guibert
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| | | | - Anthony Boureux
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| | - Florence Ruffle
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| | | | - Nicolas Gilbert
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| | - Haoliang Xue
- Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Saclay, 91198, Gif sur Yvette, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Saclay, 91198, Gif sur Yvette, France
| | - Thérèse Commes
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France
| |
Collapse
|
14
|
Riquier S, Mathieu M, Bessiere C, Boureux A, Ruffle F, Lemaitre JM, Djouad F, Gilbert N, Commes T. Long non-coding RNA exploration for mesenchymal stem cell characterisation. BMC Genomics 2021; 22:412. [PMID: 34088266 PMCID: PMC8178833 DOI: 10.1186/s12864-020-07289-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/28/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The development of RNA sequencing (RNAseq) and the corresponding emergence of public datasets have created new avenues of transcriptional marker search. The long non-coding RNAs (lncRNAs) constitute an emerging class of transcripts with a potential for high tissue specificity and function. Therefore, we tested the biomarker potential of lncRNAs on Mesenchymal Stem Cells (MSCs), a complex type of adult multipotent stem cells of diverse tissue origins, that is frequently used in clinics but which is lacking extensive characterization. RESULTS We developed a dedicated bioinformatics pipeline for the purpose of building a cell-specific catalogue of unannotated lncRNAs. The pipeline performs ab initio transcript identification, pseudoalignment and uses new methodologies such as a specific k-mer approach for naive quantification of expression in numerous RNAseq data. We next applied it on MSCs, and our pipeline was able to highlight novel lncRNAs with high cell specificity. Furthermore, with original and efficient approaches for functional prediction, we demonstrated that each candidate represents one specific state of MSCs biology. CONCLUSIONS We showed that our approach can be employed to harness lncRNAs as cell markers. More specifically, our results suggest different candidates as potential actors in MSCs biology and propose promising directions for future experimental investigations.
Collapse
Affiliation(s)
- Sébastien Riquier
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Marc Mathieu
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Chloé Bessiere
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Anthony Boureux
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Florence Ruffle
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Jean-Marc Lemaitre
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Farida Djouad
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Nicolas Gilbert
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| | - Thérèse Commes
- IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, Montpellier, France
| |
Collapse
|
15
|
African Americans and European Americans exhibit distinct gene expression patterns across tissues and tumors associated with immunologic functions and environmental exposures. Sci Rep 2021; 11:9905. [PMID: 33972602 PMCID: PMC8110974 DOI: 10.1038/s41598-021-89224-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 04/21/2021] [Indexed: 12/20/2022] Open
Abstract
The COVID-19 pandemic has affected African American populations disproportionately with respect to prevalence, and mortality. Expression profiles represent snapshots of combined genetic, socio-environmental (including socioeconomic and environmental factors), and physiological effects on the molecular phenotype. As such, they have potential to improve biological understanding of differences among populations, and provide therapeutic biomarkers and environmental mitigation strategies. Here, we undertook a large-scale assessment of patterns of gene expression between African Americans and European Americans, mining RNA-Seq data from 25 non-diseased and diseased (tumor) tissue-types. We observed the widespread enrichment of pathways implicated in COVID-19 and integral to inflammation and reactive oxygen stress. Chemokine CCL3L3 expression is up-regulated in African Americans. GSTM1, encoding a glutathione S-transferase that metabolizes reactive oxygen species and xenobiotics, is upregulated. The little-studied F8A2 gene is up to 40-fold more highly expressed in African Americans; F8A2 encodes HAP40 protein, which mediates endosome movement, potentially altering the cellular response to SARS-CoV-2. African American expression signatures, superimposed on single cell-RNA reference data, reveal increased number or activity of esophageal glandular cells and lung ACE2-positive basal keratinocytes. Our findings establish basal prognostic signatures that can be used to refine approaches to minimize risk of severe infection and improve precision treatment of COVID-19 for African Americans. To enable dissection of causes of divergent molecular phenotypes, we advocate routine inclusion of metadata on genomic and socio-environmental factors for human RNA-sequencing studies.
Collapse
|
16
|
Eagles NJ, Burke EE, Leonard J, Barry BK, Stolz JM, Huuki L, Phan BN, Serrato VL, Gutiérrez-Millán E, Aguilar-Ordoñez I, Jaffe AE, Collado-Torres L. SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses. BMC Bioinformatics 2021; 22:224. [PMID: 33932985 PMCID: PMC8088074 DOI: 10.1186/s12859-021-04142-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 04/21/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step-such as alignment of reads to a reference genome-of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided ( http://research.libd.org/SPEAQeasy/ ). CONCLUSIONS SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.
Collapse
Affiliation(s)
- Nicholas J Eagles
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Emily E Burke
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Jacob Leonard
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- QuestBridge Scholar, Palo Alto, CA, 94303, USA
| | - Brianna K Barry
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Joshua M Stolz
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Louise Huuki
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - BaDoi N Phan
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
- Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Violeta Larios Serrato
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- Instituto Politécnico Nacional, Escuela Nacional de Ciencias Biológicas, Mexico City, CDMX, 11340, Mexico
| | | | - Israel Aguilar-Ordoñez
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- Department of Supercomputing, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, CDMX, 14610, Mexico
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Department of Genetic Medicine, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
17
|
Nguyen HTN, Xue H, Firlej V, Ponty Y, Gallopin M, Gautheret D. Reference-free transcriptome signatures for prostate cancer prognosis. BMC Cancer 2021; 21:394. [PMID: 33845808 PMCID: PMC8040209 DOI: 10.1186/s12885-021-08021-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/09/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNA-seq data are increasingly used to derive prognostic signatures for cancer outcome prediction. A limitation of current predictors is their reliance on reference gene annotations, which amounts to ignoring large numbers of non-canonical RNAs produced in disease tissues. A recently introduced kind of transcriptome classifier operates entirely in a reference-free manner, relying on k-mers extracted from patient RNA-seq data. METHODS In this paper, we set out to compare conventional and reference-free signatures in risk and relapse prediction of prostate cancer. To compare the two approaches as fairly as possible, we set up a common procedure that takes as input either a k-mer count matrix or a gene expression matrix, extracts a signature and evaluates this signature in an independent dataset. RESULTS We find that both gene-based and k-mer based classifiers had similarly high performances for risk prediction and a markedly lower performance for relapse prediction. Interestingly, the reference-free signatures included a set of sequences mapping to novel lncRNAs or variable regions of cancer driver genes that were not part of gene-based signatures. CONCLUSIONS Reference-free classifiers are thus a promising strategy for the identification of novel prognostic RNA biomarkers.
Collapse
Affiliation(s)
- Ha T N Nguyen
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France
| | - Haoliang Xue
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France
| | - Virginie Firlej
- Institute of Biology, Université Paris Est Creteil, Creteil, Creteil, France
| | - Yann Ponty
- LIX CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Melina Gallopin
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France.
| |
Collapse
|
18
|
Warnat-Herresthal S, Oestreich M, Schultze JL, Becker M. Artificial Intelligence in Blood Transcriptomics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
19
|
Uhl M, Tran VD, Backofen R. Improving CLIP-seq data analysis by incorporating transcript information. BMC Genomics 2020; 21:894. [PMID: 33334306 PMCID: PMC7745353 DOI: 10.1186/s12864-020-07297-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 12/02/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue. RESULTS Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows. CONCLUSIONS Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.
Collapse
Affiliation(s)
- Michael Uhl
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany
| | - Van Dinh Tran
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany. .,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, Freiburg, 79104, Germany.
| |
Collapse
|
20
|
An improved de novo assembling and polishing of Solea senegalensis transcriptome shed light on retinoic acid signalling in larvae. Sci Rep 2020; 10:20654. [PMID: 33244091 PMCID: PMC7691524 DOI: 10.1038/s41598-020-77201-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/06/2020] [Indexed: 12/17/2022] Open
Abstract
Senegalese sole is an economically important flatfish species in aquaculture and an attractive model to decipher the molecular mechanisms governing the severe transformations occurring during metamorphosis, where retinoic acid seems to play a key role in tissue remodeling. In this study, a robust sole transcriptome was envisaged by reducing the number of assembled libraries (27 out of 111 available), fine-tuning a new automated and reproducible set of workflows for de novo assembling based on several assemblers, and removing low confidence transcripts after mapping onto a sole female genome draft. From a total of 96 resulting assemblies, two "raw" transcriptomes, one containing only Illumina reads and another with Illumina and GS-FLX reads, were selected to provide SOLSEv5.0, the most informative transcriptome with low redundancy and devoid of most single-exon transcripts. It included both Illumina and GS-FLX reads and consisted of 51,348 transcripts of which 22,684 code for 17,429 different proteins described in databases, where 9527 were predicted as complete proteins. SOLSEv5.0 was used as reference for the study of retinoic acid (RA) signalling in sole larvae using drug treatments (DEAB, a RA synthesis blocker, and TTNPB, a RA-receptor agonist) for 24 and 48 h. Differential expression and functional interpretation were facilitated by an updated version of DEGenes Hunter. Acute exposure of both drugs triggered an intense, specific and transient response at 24 h but with hardly observable differences after 48 h at least in the DEAB treatments. Activation of RA signalling by TTNPB specifically increased the expression of genes in pathways related to RA degradation, retinol storage, carotenoid metabolism, homeostatic response and visual cycle, and also modified the expression of transcripts related to morphogenesis and collagen fibril organisation. In contrast, DEAB mainly decreased genes related to retinal production, impairing phototransduction signalling in the retina. A total of 755 transcripts mainly related to lipid metabolism, lipid transport and lipid homeostasis were altered in response to both treatments, indicating non-specific drug responses associated with intestinal absorption. These results indicate that a new assembling and transcript sieving were both necessary to provide a reliable transcriptome to identify the many aspects of RA action during sole development that are of relevance for sole aquaculture.
Collapse
|
21
|
Qiu Z, Chen S, Qi Y, Liu C, Zhai J, Xie S, Ma C. Exploring transcriptional switches from pairwise, temporal and population RNA-Seq data using deepTS. Brief Bioinform 2020; 22:5877690. [PMID: 32728687 DOI: 10.1093/bib/bbaa137] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 05/25/2020] [Accepted: 06/05/2020] [Indexed: 12/11/2022] Open
Abstract
Transcriptional switch (TS) is a widely observed phenomenon caused by changes in the relative expression of transcripts from the same gene, in spatial, temporal or other dimensions. TS has been associated with human diseases, plant development and stress responses. Its investigation is often hampered by a lack of suitable tools allowing comprehensive and flexible TS analysis for high-throughput RNA sequencing (RNA-Seq) data. Here, we present deepTS, a user-friendly web-based implementation that enables a fully interactive, multifunctional identification, visualization and analysis of TS events for large-scale RNA-Seq datasets from pairwise, temporal and population experiments. deepTS offers rich functionality to streamline RNA-Seq-based TS analysis for both model and non-model organisms and for those with or without reference transcriptome. The presented case studies highlight the capabilities of deepTS and demonstrate its potential for the transcriptome-wide TS analysis of pairwise, temporal and population RNA-Seq data. We believe deepTS will help research groups, regardless of their informatics expertise, perform accessible, reproducible and collaborative TS analyses of large-scale RNA-Seq data.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Chuang Ma
- Bioinformatics Laboratory at Northwest A&F University
| |
Collapse
|
22
|
Ray TA, Cochran K, Kozlowski C, Wang J, Alexander G, Cady MA, Spencer WJ, Ruzycki PA, Clark BS, Laeremans A, He MX, Wang X, Park E, Hao Y, Iannaccone A, Hu G, Fedrigo O, Skiba NP, Arshavsky VY, Kay JN. Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease. Nat Commun 2020; 11:3328. [PMID: 32620864 PMCID: PMC7335077 DOI: 10.1038/s41467-020-17009-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 05/30/2020] [Indexed: 02/08/2023] Open
Abstract
Genes encoding cell-surface proteins control nervous system development and are implicated in neurological disorders. These genes produce alternative mRNA isoforms which remain poorly characterized, impeding understanding of how disease-associated mutations cause pathology. Here we introduce a strategy to define complete portfolios of full-length isoforms encoded by individual genes. Applying this approach to neural cell-surface molecules, we identify thousands of unannotated isoforms expressed in retina and brain. By mass spectrometry we confirm expression of newly-discovered proteins on the cell surface in vivo. Remarkably, we discover that the major isoform of a retinal degeneration gene, CRB1, was previously overlooked. This CRB1 isoform is the only one expressed by photoreceptors, the affected cells in CRB1 disease. Using mouse mutants, we identify a function for this isoform at photoreceptor-glial junctions and demonstrate that loss of this isoform accelerates photoreceptor death. Therefore, our isoform identification strategy enables discovery of new gene functions relevant to disease.
Collapse
Affiliation(s)
- Thomas A Ray
- Department of Neurobiology, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Kelly Cochran
- Department of Neurobiology, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Chris Kozlowski
- Department of Neurobiology, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Jingjing Wang
- Department of Neurobiology, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Graham Alexander
- Center for Genomic and Computational Biology, Duke University, Durham, NC, 27710, USA
| | | | | | - Philip A Ruzycki
- John F. Hardesty, M.D. Department of Ophthalmology and Visual Sciences, Washington University, St. Louis, MO, 63110, USA
| | - Brian S Clark
- John F. Hardesty, M.D. Department of Ophthalmology and Visual Sciences, Washington University, St. Louis, MO, 63110, USA
- Department of Developmental Biology, Washington University, St. Louis, MO, 63110, USA
| | | | - Ming-Xiao He
- Advanced Cell Diagnostics, Newark, CA, 94560, USA
| | | | - Emily Park
- Advanced Cell Diagnostics, Newark, CA, 94560, USA
| | - Ying Hao
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Alessandro Iannaccone
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Gary Hu
- Department of Neurobiology, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Olivier Fedrigo
- Center for Genomic and Computational Biology, Duke University, Durham, NC, 27710, USA
- The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Nikolai P Skiba
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Vadim Y Arshavsky
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Jeremy N Kay
- Department of Neurobiology, Duke University School of Medicine, Durham, NC, 27710, USA.
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC, 27710, USA.
| |
Collapse
|
23
|
Matsumoto H, Hayashi T, Ozaki H, Tsuyuzaki K, Umeda M, Iida T, Nakamura M, Okano H, Nikaido I. An NMF-based approach to discover overlooked differentially expressed gene regions from single-cell RNA-seq data. NAR Genom Bioinform 2019; 2:lqz020. [PMID: 34632380 PMCID: PMC8499053 DOI: 10.1093/nargab/lqz020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 11/05/2019] [Accepted: 11/29/2019] [Indexed: 12/31/2022] Open
Abstract
Single-cell RNA sequencing has enabled researchers to quantify the transcriptomes of individual cells, infer cell types and investigate differential expression among cell types, which will lead to a better understanding of the regulatory mechanisms of cell states. Transcript diversity caused by phenomena such as aberrant splicing events have been revealed, and differential expression of previously unannotated transcripts might be overlooked by annotation-based analyses. Accordingly, we have developed an approach to discover overlooked differentially expressed (DE) gene regions that complements annotation-based methods. Our algorithm decomposes mapped count data matrix for a gene region using non-negative matrix factorization, quantifies the differential expression level based on the decomposed matrix, and compares the differential expression level based on annotation-based approach to discover previously unannotated DE transcripts. We performed single-cell RNA sequencing for human neural stem cells and applied our algorithm to the dataset. We also applied our algorithm to two public single-cell RNA sequencing datasets correspond to mouse ES and primitive endoderm cells, and human preimplantation embryos. As a result, we discovered several intriguing DE transcripts, including a transcript related to the modulation of neural stem/progenitor cell differentiation.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- Medical Image Analysis Team, RIKEN Center for Advanced Intelligence Project, Nihonbashi 1-chome Mitsui Building 15F, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan.,Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Tetsutaro Hayashi
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Haruka Ozaki
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan.,Bioinformatics Laboratory, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| | - Koki Tsuyuzaki
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Mana Umeda
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Tsuyoshi Iida
- Department of Orthopaedic Surgery, Keio University School of Medicine, 35 Sinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Masaya Nakamura
- Department of Orthopaedic Surgery, Keio University School of Medicine, 35 Sinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Hideyuki Okano
- Department of Physiology, Keio University School of Medicine, 35 Sinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan
| | - Itoshi Nikaido
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.,Bioinformatics Course, Master's/Doctoral Program in Life Science Innovation (T-LSI), School of Integrative and Global Majors (SIGMA), University of Tsukuba, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| |
Collapse
|
24
|
Pinskaya M, Saci Z, Gallopin M, Gabriel M, Nguyen HT, Firlej V, Descrimes M, Rapinat A, Gentien D, Taille ADL, Londoño-Vallejo A, Allory Y, Gautheret D, Morillon A. Reference-free transcriptome exploration reveals novel RNAs for prostate cancer diagnosis. Life Sci Alliance 2019; 2:2/6/e201900449. [PMID: 31732695 PMCID: PMC6858606 DOI: 10.26508/lsa.201900449] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 11/05/2019] [Accepted: 11/05/2019] [Indexed: 12/24/2022] Open
Abstract
The use of RNA-sequencing technologies held a promise of improved diagnostic tools based on comprehensive transcript sets. However, mining human transcriptome data for disease biomarkers in clinical specimens are restricted by the limited power of conventional reference-based protocols relying on unique and annotated transcripts. Here, we implemented a blind reference-free computational protocol, DE-kupl, to infer yet unreferenced RNA variations from total stranded RNA-sequencing datasets of tissue origin. As a bench test, this protocol was powered for detection of RNA subsequences embedded into putative long noncoding (lnc)RNAs expressed in prostate cancer. Through filtering of 1,179 candidates, we defined 21 lncRNAs that were further validated by NanoString for robust tumor-specific expression in 144 tissue specimens. Predictive modeling yielded a restricted probe panel enabling more than 90% of true-positive detections of cancer in an independent The Cancer Genome Atlas cohort. Remarkably, this clinical signature made of only nine unannotated lncRNAs largely outperformed PCA3, the only used prostate cancer lncRNA biomarker, in detection of high-risk tumors. This modular workflow is highly sensitive and can be applied to any pathology or clinical application.
Collapse
Affiliation(s)
- Marina Pinskaya
- ncRNA, Epigenetic and Genome Fluidity, Université Paris Sciences & Lettres (PSL), Sorbonne Université, Centre National de la Recherche Scientifique (CNRS), Institut Curie, Research Center, Paris, France
| | - Zohra Saci
- ncRNA, Epigenetic and Genome Fluidity, Université Paris Sciences & Lettres (PSL), Sorbonne Université, Centre National de la Recherche Scientifique (CNRS), Institut Curie, Research Center, Paris, France
| | - Mélina Gallopin
- Institute for Integrative Biology of the Cell, Commissariat à l'Energie Atomique, CNRS, Université Paris-Sud, Université Paris-Saclay, Gif sur Yvette, France
| | - Marc Gabriel
- ncRNA, Epigenetic and Genome Fluidity, Université Paris Sciences & Lettres (PSL), Sorbonne Université, Centre National de la Recherche Scientifique (CNRS), Institut Curie, Research Center, Paris, France
| | - Ha Tn Nguyen
- Institute for Integrative Biology of the Cell, Commissariat à l'Energie Atomique, CNRS, Université Paris-Sud, Université Paris-Saclay, Gif sur Yvette, France.,Thuyloi University, Hanoi, Vietnam
| | - Virginie Firlej
- Université Paris-Est Créteil, Créteil, France.,Institut National de la Santé et de la Recherche Médicale, U955, Equipe 7, Créteil, France
| | - Marc Descrimes
- ncRNA, Epigenetic and Genome Fluidity, Université Paris Sciences & Lettres (PSL), Sorbonne Université, Centre National de la Recherche Scientifique (CNRS), Institut Curie, Research Center, Paris, France
| | - Audrey Rapinat
- Translational Research Department, Genomics Platform, Institut Curie, Université PSL, Paris, France
| | - David Gentien
- Translational Research Department, Genomics Platform, Institut Curie, Université PSL, Paris, France
| | - Alexandre de la Taille
- Université Paris-Est Créteil, Créteil, France.,Institut National de la Santé et de la Recherche Médicale, U955, Equipe 7, Créteil, France.,Assistance Publique - Hôpitaux de Paris, Hôpital Henri Mondor, Département d'Urologie, Créteil, France
| | - Arturo Londoño-Vallejo
- Telomeres and Cancer, Université PSL, Sorbonne Université, CNRS, Institut Curie, Research Center, Paris, France
| | - Yves Allory
- Compartimentation et Dynamique Cellulaire, Université PSL, Sorbonne Université, CNRS, Institut Curie, Research Center, Paris, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, Commissariat à l'Energie Atomique, CNRS, Université Paris-Sud, Université Paris-Saclay, Gif sur Yvette, France
| | - Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, Université Paris Sciences & Lettres (PSL), Sorbonne Université, Centre National de la Recherche Scientifique (CNRS), Institut Curie, Research Center, Paris, France
| |
Collapse
|