1
|
Saville L, Wu L, Habtewold J, Cheng Y, Gollen B, Mitchell L, Stuart-Edwards M, Haight T, Mohajerani M, Zovoilis A. NERD-seq: a novel approach of Nanopore direct RNA sequencing that expands representation of non-coding RNAs. Genome Biol 2024; 25:233. [PMID: 39198865 PMCID: PMC11351768 DOI: 10.1186/s13059-024-03375-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 08/20/2024] [Indexed: 09/01/2024] Open
Abstract
Non-coding RNAs (ncRNAs) are frequently documented RNA modification substrates. Nanopore Technologies enables the direct sequencing of RNAs and the detection of modified nucleobases. Ordinarily, direct RNA sequencing uses polyadenylation selection, studying primarily mRNA gene expression. Here, we present NERD-seq, which enables detection of multiple non-coding RNAs, excluded by the standard approach, alongside natively polyadenylated transcripts. Using neural tissues as a proof of principle, we show that NERD-seq expands representation of frequently modified non-coding RNAs, such as snoRNAs, snRNAs, scRNAs, srpRNAs, tRNAs, and rRFs. NERD-seq represents an RNA-seq approach to simultaneously study mRNA and ncRNA epitranscriptomes in brain tissues and beyond.
Collapse
Affiliation(s)
- Luke Saville
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Li Wu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
| | - Jemaneh Habtewold
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
| | - Yubo Cheng
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Babita Gollen
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Liam Mitchell
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Matthew Stuart-Edwards
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Travis Haight
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Majid Mohajerani
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Athanasios Zovoilis
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada.
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada.
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada.
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada.
| |
Collapse
|
2
|
Byrne A, Le D, Sereti K, Menon H, Vaidya S, Patel N, Lund J, Xavier-Magalhães A, Shi M, Liang Y, Sterne-Weiler T, Modrusan Z, Stephenson W. Single-cell long-read targeted sequencing reveals transcriptional variation in ovarian cancer. Nat Commun 2024; 15:6916. [PMID: 39134520 PMCID: PMC11319652 DOI: 10.1038/s41467-024-51252-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
Single-cell RNA sequencing predominantly employs short-read sequencing to characterize cell types, states and dynamics; however, it is inadequate for comprehensive characterization of RNA isoforms. Long-read sequencing technologies enable single-cell RNA isoform detection but are hampered by lower throughput and unintended sequencing of artifacts. Here we develop Single-cell Targeted Isoform Long-Read Sequencing (scTaILoR-seq), a hybridization capture method which targets over a thousand genes of interest, improving the median number of on-target transcripts per cell by 29-fold. We use scTaILoR-seq to identify and quantify RNA isoforms from ovarian cancer cell lines and primary tumors, yielding 10,796 single-cell transcriptomes. Using long-read variant calling we reveal associations of expressed single nucleotide variants (SNVs) with alternative transcript structures. Phasing of SNVs across transcripts enables the measurement of allelic imbalance within distinct cell populations. Overall, scTaILoR-seq is a long-read targeted RNA sequencing method and analytical framework for exploring transcriptional variation at single-cell resolution.
Collapse
Affiliation(s)
- Ashley Byrne
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Daniel Le
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Kostianna Sereti
- Department of Discovery Oncology, Genentech, South San Francisco, CA, USA
| | - Hari Menon
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Samir Vaidya
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Neha Patel
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Jessica Lund
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Ana Xavier-Magalhães
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Minyi Shi
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Yuxin Liang
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA
| | - Timothy Sterne-Weiler
- Department of Discovery Oncology, Genentech, South San Francisco, CA, USA
- Department of Oncology Bioinformatics, Genentech, South San Francisco, CA, USA
| | - Zora Modrusan
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA.
| | - William Stephenson
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, CA, USA.
| |
Collapse
|
3
|
Chen S, Meng J, Zhang Y. Quantitative profiling N1-methyladenosine (m1A) RNA methylation from Oxford nanopore direct RNA sequencing data. Methods 2024; 228:30-37. [PMID: 38768930 DOI: 10.1016/j.ymeth.2024.05.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/17/2024] [Accepted: 05/10/2024] [Indexed: 05/22/2024] Open
Abstract
With the recent advanced direct RNA sequencing technique that proposed by the Oxford Nanopore Technologies, RNA modifications can be detected and profiled in a simple and straightforward manner. Majority nanopore-based modification studies were devoted to those popular types such as m6A and pseudouridine. To address current limitations on studying the crucial regulator, m1A modification, we conceived this study. We have developed an integrated computational workflow designed for the detection of m1A modifications from direct RNA sequencing data. This workflow comprises a feature extractor responsible for capturing signal characteristics (such as mean, standard deviations, and length of electric signals), a single molecule-level m1A predictor trained with features extracted from the IVT dataset using classical machine learning algorithms, a confident m1A site selector employing the binomial test to identify statistically significant m1A sites, and an m1A modification rate estimator. Our model achieved accurate molecule-level prediction (Average AUC = 0.9689) and reliable m1A site detection and quantification. To show the feasibility of our workflow, we conducted a study on in vivo transcribed human HEK293 cell line, and the results were carefully annotated and compared with other techniques (i.e., Illumina sequencing-based techniques). We believed that this tool will enabling a comprehensive understanding of the m1A modification and its functional mechanisms within cells and organisms.
Collapse
Affiliation(s)
- Shenglun Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; lnstitute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Wisdom Lake Academy of Pharmacy, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Al University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; lnstitute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; lnstitute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom.
| |
Collapse
|
4
|
Grigorev K, Nelson TM, Overbey EG, Houerbi N, Kim J, Najjar D, Damle N, Afshin EE, Ryon KA, Thierry-Mieg J, Thierry-Mieg D, Melnick AM, Mateus J, Mason CE. Direct RNA sequencing of astronaut blood reveals spaceflight-associated m6A increases and hematopoietic transcriptional responses. Nat Commun 2024; 15:4950. [PMID: 38862496 PMCID: PMC11166648 DOI: 10.1038/s41467-024-48929-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/17/2024] [Indexed: 06/13/2024] Open
Abstract
The advent of civilian spaceflight challenges scientists to precisely describe the effects of spaceflight on human physiology, particularly at the molecular and cellular level. Newer, nanopore-based sequencing technologies can quantitatively map changes in chemical structure and expression at single molecule resolution across entire isoforms. We perform long-read, direct RNA nanopore sequencing, as well as Ultima high-coverage RNA-sequencing, of whole blood sampled longitudinally from four SpaceX Inspiration4 astronauts at seven timepoints, spanning pre-flight, day of return, and post-flight recovery. We report key genetic pathways, including changes in erythrocyte regulation, stress induction, and immune changes affected by spaceflight. We also present the first m6A methylation profiles for a human space mission, suggesting a significant spike in m6A levels immediately post-flight. These data and results represent the first longitudinal long-read RNA profiles and RNA modification maps for each gene for astronauts, improving our understanding of the human transcriptome's dynamic response to spaceflight.
Collapse
Affiliation(s)
- Kirill Grigorev
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Theodore M Nelson
- Department of Microbiology and Immunology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Eliah G Overbey
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Center for STEM, University of Austin, Austin, TX, USA
- BioAstra, Inc, New York, NY, USA
| | - Nadia Houerbi
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - JangKeun Kim
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Deena Najjar
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Namita Damle
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Evan E Afshin
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Krista A Ryon
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information (NCBI), National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Danielle Thierry-Mieg
- National Center for Biotechnology Information (NCBI), National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Ari M Melnick
- Department of Medicine, Weill Cornell Medicine, New York, NY, 10021, USA
| | - Jaime Mateus
- Space Exploration Technologies Corporation (SpaceX), Hawthorne, CA, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- WorldQuant Initiative for Quantitative Prediction, New York, NY, USA.
| |
Collapse
|
5
|
Beletskiy A, Zolotar A, Fortygina P, Chesnokova E, Uroshlev L, Balaban P, Kolosov P. Downregulation of Ribosomal Protein Genes Is Revealed in a Model of Rat Hippocampal Neuronal Culture Activation with GABA(A)R/GlyRa2 Antagonist Picrotoxin. Cells 2024; 13:383. [PMID: 38474347 DOI: 10.3390/cells13050383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/13/2024] [Accepted: 02/20/2024] [Indexed: 03/14/2024] Open
Abstract
Long-read transcriptome sequencing provides us with a convenient tool for the thorough study of biological processes such as neuronal plasticity. Here, we aimed to perform transcriptional profiling of rat hippocampal primary neuron cultures after stimulation with picrotoxin (PTX) to further understand molecular mechanisms of neuronal activation. To overcome the limitations of short-read RNA-Seq approaches, we performed an Oxford Nanopore Technologies MinION-based long-read sequencing and transcriptome assembly of rat primary hippocampal culture mRNA at three time points after the PTX activation. We used a specific approach to exclude uncapped mRNAs during sample preparation. Overall, we found 23,652 novel transcripts in comparison to reference annotations, out of which ~6000 were entirely novel and mostly transposon-derived loci. Analysis of differentially expressed genes (DEG) showed that 3046 genes were differentially expressed, of which 2037 were upregulated and 1009 were downregulated at 30 min after the PTX application, with only 446 and 13 genes differentially expressed at 1 h and 5 h time points, respectively. Most notably, multiple genes encoding ribosomal proteins, with a high basal expression level, were downregulated after 30 min incubation with PTX; we suggest that this indicates redistribution of transcriptional resources towards activity-induced genes. Novel loci and isoforms observed in this study may help us further understand the functional mRNA repertoire in neuronal plasticity processes. Together with other NGS techniques, differential gene expression analysis of sequencing data obtained using MinION platform might provide a simple method to optimize further study of neuronal plasticity.
Collapse
Affiliation(s)
- Alexander Beletskiy
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
| | - Anastasia Zolotar
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
| | - Polina Fortygina
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
| | - Ekaterina Chesnokova
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
| | - Leonid Uroshlev
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
| | - Pavel Balaban
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
| | - Peter Kolosov
- Institute of Higher Nervous Activity and Neurophysiology, The Russian Academy of Sciences, 117485 Moscow, Russia
- Engelhardt Institute of Molecular Biology, The Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
6
|
Maina S, Norton SL, Rodoni BC. Hybrid RNA sequencing of broad bean wilt virus 2 from faba beans. Microbiol Spectr 2023; 11:e0266323. [PMID: 37823658 PMCID: PMC10714761 DOI: 10.1128/spectrum.02663-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 09/01/2023] [Indexed: 10/13/2023] Open
Abstract
IMPORTANCE Globally, viral diseases impair the growth and vigor of cultivated crops such as grains, leading to a significant reduction in quality, marketability, and competitiveness. As an island nation, Australia has a distinct advantage in using its border to prevent the introduction of damaging viruses, which threaten the continental agricultural sector. However, breeding programs in Australia rely on imported seeds as new sources of genetic diversity. As such, it is critical to remain vigilant in identifying new and emerging viral pathogens, by ensuring the availability of accurate genomic diagnostic tools at the grain biosecurity border. High-throughput sequencing offers game-changing opportunities in biosecurity routine testing. Genomic results are more accurate and informative compared to traditional molecular methods or biological indexing. The present work contributes to strengthening accurate phytosanitary screening, to safeguard the Australian grains industry, and expedite germplasm release to the end users.
Collapse
Affiliation(s)
- Solomon Maina
- NSW Department of Primary Industries, Biosecurity & Food Safety, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, Menangle, NSW, Australia
- Australian Grains Genebank, Agriculture Victoria, Horsham, Victoria, Australia
| | - Sally L. Norton
- Australian Grains Genebank, Agriculture Victoria, Horsham, Victoria, Australia
| | - Brendan C. Rodoni
- Microbial Sciences, Pests & Diseases, Agriculture Victoria, AgriBio, Ring Road, Bundoora, Victoria, Australia
- School of Applied Systems Biology (SASB), La Trobe University, Bundoora, Victoria, Australia
| |
Collapse
|
7
|
Xu R, Prakoso D, Salvador LCM, Rajeev S. Leptospira transcriptome sequencing using long-read technology reveals unannotated transcripts and potential polyadenylation of RNA molecules. Microbiol Spectr 2023; 11:e0223423. [PMID: 37861327 PMCID: PMC10715090 DOI: 10.1128/spectrum.02234-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/11/2023] [Indexed: 10/21/2023] Open
Abstract
IMPORTANCE Leptospirosis, caused by the spirochete bacteria Leptospira, is a zoonotic disease of humans and animals, accounting for over 1 million annual human cases and over 60,000 deaths. We have characterized operon transcriptional units, identified novel RNA coding regions, and reported evidence of potential posttranscriptional polyadenylation in the Leptospira transcriptomes for the first time using Oxford Nanopore Technology RNA sequencing protocols. The newly identified RNA coding regions and operon transcriptional units were detected only in the pathogenic Leptospira transcriptomes, suggesting their significance in virulence-related functions. This article integrates bioinformatics, infectious diseases, microbiology, molecular biology, veterinary sciences, and public health. Given the current knowledge gap in the regulation of leptospiral pathogenicity, our findings offer valuable insights to researchers studying leptospiral pathogenicity and provide both a basis and a tool for researchers focusing on prokaryotic molecular studies for the understanding of RNA compositions and prokaryotic polyadenylation for their organisms of interest.
Collapse
Affiliation(s)
- Ruijie Xu
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, USA
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, USA
| | - Dhani Prakoso
- Department of Biomedical and Diagnostic Sciences, College of Veterinary Medicine, University of Tennessee, Knoxville, Tennessee, USA
| | - Liliana C. M. Salvador
- Institute of Bioinformatics, University of Georgia, Athens, Georgia, USA
- Center for the Ecology of Infectious Diseases, University of Georgia, Athens, Georgia, USA
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, Georgia, USA
| | - Sreekumari Rajeev
- Department of Biomedical and Diagnostic Sciences, College of Veterinary Medicine, University of Tennessee, Knoxville, Tennessee, USA
| |
Collapse
|
8
|
Schuster J, Ritchie ME, Gouil Q. Restrander: rapid orientation and artefact removal for long-read cDNA data. NAR Genom Bioinform 2023; 5:lqad108. [PMID: 38143957 PMCID: PMC10748469 DOI: 10.1093/nargab/lqad108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 11/07/2023] [Accepted: 12/14/2023] [Indexed: 12/26/2023] Open
Abstract
In transcriptomic analyses, it is helpful to keep track of the strand of the RNA molecules. However, the Oxford Nanopore long-read cDNA sequencing protocols generate reads that correspond to either the first or second-strand cDNA, therefore the strandedness of the initial transcript has to be inferred bioinformatically. Reverse transcription and PCR can also introduce artefacts which should be flagged in data pre-processing. Here we introduce Restrander, a lightning-fast and highly accurate tool for restranding and removing artefacts in long-read cDNA sequencing data. Thanks to its C++ implementation, Restrander was faster than Oxford Nanopore Technologies' existing tool Pychopper, and correctly restranded more reads due to its strategy of searching for polyA/T tails in addition to primer sequences from the reverse transcription and template-switch steps. We found that restranding improved the process of visualising and exploring data, and increased the number of novel isoforms discovered by bambu, particularly in regions where sense and anti-sense transcripts co-occur. The artefact detection implemented in Restrander quantifies reads lacking the correct 5' and 3' ends, a useful feature in quality control for library preparation. Restrander is pre-configured for all major cDNA protocols, and can be customised with user-defined primers. Restrander is available at https://github.com/mritchielab/restrander.
Collapse
Affiliation(s)
- Jakob Schuster
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Matthew E Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia
- Department of Medical Biology, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
9
|
Dong X, Du MRM, Gouil Q, Tian L, Jabbari JS, Bowden R, Baldoni PL, Chen Y, Smyth GK, Amarasinghe SL, Law CW, Ritchie ME. Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Nat Methods 2023; 20:1810-1821. [PMID: 37783886 DOI: 10.1038/s41592-023-02026-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 08/25/2023] [Indexed: 10/04/2023]
Abstract
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
Collapse
Affiliation(s)
- Xueyi Dong
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| | - Mei R M Du
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Quentin Gouil
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Luyi Tian
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- Guangzhou National Laboratory, Guangzhou, China
| | - Jafar S Jabbari
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Rory Bowden
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Pedro L Baldoni
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Yunshun Chen
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia
| | - Shanika L Amarasinghe
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
- The Australian Regenerative Medicine Institute, Monash University, Clayton, Victoria, Australia
| | - Charity W Law
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia
| | - Matthew E Ritchie
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
- Department of Medical Biology, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
10
|
Engelhard CA, Khani S, Derdak S, Bilban M, Kornfeld JW. Nanopore sequencing unveils the complexity of the cold-activated murine brown adipose tissue transcriptome. iScience 2023; 26:107190. [PMID: 37564700 PMCID: PMC10410515 DOI: 10.1016/j.isci.2023.107190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/28/2023] [Accepted: 06/16/2023] [Indexed: 08/12/2023] Open
Abstract
Alternative transcription increases transcriptome complexity by expression of multiple transcripts per gene. Annotation and quantification of transcripts using short-read sequencing is non-trivial. Long-read sequencing aims at overcoming these problems by sequencing full-length transcripts. Activation of brown adipose tissue (BAT) thermogenesis involves major transcriptomic remodeling and positively affects metabolism via increased energy expenditure. We benchmark Oxford Nanopore Technology (ONT) long-read sequencing protocols to Illumina short-read sequencing assessing alignment characteristics, gene and transcript detection and quantification, differential gene and transcript expression, transcriptome reannotation, and differential transcript usage (DTU). We find ONT sequencing is superior to Illumina for transcriptome reassembly, reducing the risk of false-positive events by unambiguously mapping reads to transcripts. We identified novel isoforms of genes undergoing DTU in cold-activated BAT including Cars2, Adtrp, Acsl5, Scp2, Aldoa, and Pde4d, validated by real-time PCR. The reannotated murine BAT transcriptome established here provides a framework for future investigations into the regulation of BAT.
Collapse
Affiliation(s)
- Christoph Andreas Engelhard
- Department for Biochemistry and Molecular Biology (BMB), University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| | - Sajjad Khani
- Max Planck Institute for Metabolism Research, Gleueler Strasse 50, 50931 Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Ageing-Associated Diseases (CECAD), University of Cologne, Cologne, Germany
| | - Sophia Derdak
- Core Facilities, Medical University of Vienna, Lazarettgasse 14, 1090 Vienna, Austria
| | - Martin Bilban
- Department of Laboratory Medicine & Core Facilities, Medical University of Vienna, Waehringer Guertel 18-20, 1090 Vienna, Austria
| | - Jan-Wilhelm Kornfeld
- Department for Biochemistry and Molecular Biology (BMB), University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| |
Collapse
|
11
|
Hughes AEO, Montgomery MC, Liu C, Weimer ET. Allele-specific quantification of human leukocyte antigen transcript isoforms by nanopore sequencing. Front Immunol 2023; 14:1199618. [PMID: 37662944 PMCID: PMC10471969 DOI: 10.3389/fimmu.2023.1199618] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/05/2023] [Indexed: 09/05/2023] Open
Abstract
Introduction While tens of thousands of HLA alleles have been identified by DNA sequencing, the contribution of alternative splicing to HLA diversity is not well characterized. In this study, we sought to determine if long-read sequencing could be used to accurately quantify allele-specific HLA transcripts in primary human lymphocytes. Methods cDNA libraries were prepared from peripheral blood lymphocytes from 12 donors and sequenced by nanopore long-read sequencing. HLA reads were aligned to donor-specific reference sequences based on the known type of each donor. Allele-specific exon utilization was calculated as the proportion of reads aligning to each allele containing known exons, and transcript isoforms were quantified based on patterns of exon utilization within individual reads. Results Splice variants were rare among class I HLA genes (median exon retention rate 99%-100%), except for several HLA-C alleles with exon 5 spliced out of up to 15% of reads. Splice variants were also rare among class II HLA genes (median exon retention rate 98%-100%), except for HLA-DQB1. Consistent with previous work, exon 5 of HLA-DQB1 was spliced out in alleles with a mutated splice acceptor site at rs28688207. Surprisingly, a 28% loss of exon 5 was also observed in HLA-DQB1 alleles with an intact splice acceptor site at rs28688207. Discussion We describe a simple bioinformatic workflow to quantify allele-specific expression of HLA transcript isoforms. Further studies are warranted to characterize the repertoire of HLA transcripts expressed in different cell types and tissues across diverse populations.
Collapse
Affiliation(s)
- Andrew E. O. Hughes
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Maureen C. Montgomery
- Molecular Immunology Laboratory, McLendon Clinical Laboratories, University of North Carolina Hospitals, Chapel Hill, NC, United States
| | - Chang Liu
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, United States
| | - Eric T. Weimer
- Molecular Immunology Laboratory, McLendon Clinical Laboratories, University of North Carolina Hospitals, Chapel Hill, NC, United States
- Department of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, United States
| |
Collapse
|
12
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
13
|
Prawer YDJ, Gleeson J, De Paoli-Iseppi R, Clark MB. Pervasive effects of RNA degradation on Nanopore direct RNA sequencing. NAR Genom Bioinform 2023; 5:lqad060. [PMID: 37305170 PMCID: PMC10251640 DOI: 10.1093/nargab/lqad060] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 04/18/2023] [Accepted: 06/07/2023] [Indexed: 06/13/2023] Open
Abstract
Oxford Nanopore direct RNA sequencing (DRS) is capable of sequencing complete RNA molecules and accurately measuring gene and isoform expression. However, as DRS is designed to profile intact RNA, expression quantification may be more heavily dependent upon RNA integrity than alternative RNA sequencing methodologies. It is currently unclear how RNA degradation impacts DRS or whether it can be corrected for. To assess the impact of RNA integrity on DRS, we performed a degradation time series using SH-SY5Y neuroblastoma cells. Our results demonstrate that degradation is a significant and pervasive factor that can bias DRS measurements, including a reduction in library complexity resulting in an overrepresentation of short genes and isoforms. Degradation also biases differential expression analyses; however, we find that explicit correction can almost fully recover meaningful biological signal. In addition, DRS provided less biased profiling of partially degraded samples than Nanopore PCR-cDNA sequencing. Overall, we find that samples with RNA integrity number (RIN) > 9.5 can be treated as undegraded and samples with RIN > 7 can be utilized for DRS with appropriate correction. These results establish the suitability of DRS for a wide range of samples, including partially degraded in vivo clinical and post-mortem samples, while limiting the confounding effect of degradation on expression quantification.
Collapse
Affiliation(s)
- Yair D J Prawer
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Josie Gleeson
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ricardo De Paoli-Iseppi
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Michael B Clark
- To whom correspondence should be addressed. Tel: +61 3 9035 3669;
| |
Collapse
|
14
|
Li J, Guan D, Halstead MM, Islas-Trejo AD, Goszczynski DE, Ernst CW, Cheng H, Ross P, Zhou H. Transcriptome annotation of 17 porcine tissues using nanopore sequencing technology. Anim Genet 2023; 54:35-44. [PMID: 36385508 DOI: 10.1111/age.13274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 10/20/2022] [Accepted: 11/01/2022] [Indexed: 11/18/2022]
Abstract
The annotation of animal genomes plays an important role in elucidating molecular mechanisms behind the genetic control of economically important traits. Here, we employed long-read sequencing technology, Oxford Nanopore Technology, to annotate the pig transcriptome across 17 tissues from two Yorkshire littermate pigs. More than 9.8 million reads were obtained from a single flow cell, and 69 781 unique transcripts at 50 108 loci were identified. Of these transcripts, 16 255 were found to be novel isoforms, and 22 344 were found at loci that were novel and unannotated in the Ensembl (release 102) and NCBI (release 106) annotations. Novel transcripts were mostly expressed in cerebellum, followed by lung, liver, spleen, and hypothalamus. By comparing the unannotated transcripts to existing databases, there were 21 285 (95.3%) transcripts matched to the NT database (v5) and 13 676 (61.2%) matched to the NR database (v5). Moreover, there were 4324 (19.4%) transcripts matched to the SwissProt database (v5), corresponding to 11 356 proteins. Tissue-specific gene expression analyses showed that 9749 transcripts were highly tissue-specific, and cerebellum contained the most tissue-specific transcripts. As the same samples were used for the annotation of cis-regulatory elements in the pig genome, the transcriptome annotation generated by this study provides an additional and complementary annotation resource for the Functional Annotation of Animal Genomes effort to comprehensively annotate the pig genome.
Collapse
Affiliation(s)
- Jinghui Li
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Dailu Guan
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Michelle M Halstead
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Alma D Islas-Trejo
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Daniel E Goszczynski
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing, Michigan, USA
| | - Hao Cheng
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Pablo Ross
- Department of Animal Science, University of California Davis, Davis, California, USA
| | - Huaijun Zhou
- Department of Animal Science, University of California Davis, Davis, California, USA
| |
Collapse
|
15
|
Cozzuto L, Delgado-Tejedor A, Hermoso Pulido T, Novoa EM, Ponomarenko J. Nanopore Direct RNA Sequencing Data Processing and Analysis Using MasterOfPores. Methods Mol Biol 2023; 2624:185-205. [PMID: 36723817 DOI: 10.1007/978-1-0716-2962-8_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
This chapter describes MasterOfPores v.2 (MoP2), an open-source suite of pipelines for processing and analyzing direct RNA Oxford Nanopore sequencing data. The MoP2 relies on the Nextflow DSL2 framework and Linux containers, thus enabling reproducible data analysis in transcriptomic and epitranscriptomic studies. We introduce the key concepts of MoP2 and provide a step-by-step fully reproducible and complete example of how to use the workflow for the analysis of S. cerevisiae total RNA samples sequenced using MinION flowcells. The workflow starts with the pre-processing of raw FAST5 files, which includes basecalling, read quality control, demultiplexing, filtering, mapping, estimation of per-gene/transcript abundances, and transcriptome assembly, with support of the GPU computing for the basecalling and read demultiplexing steps. The secondary analyses of the workflow focus on the estimation of RNA poly(A) tail lengths and the identification of RNA modifications. The MoP2 code is available at https://github.com/biocorecrg/MOP2 and is distributed under the MIT license.
Collapse
Affiliation(s)
- Luca Cozzuto
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Anna Delgado-Tejedor
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Toni Hermoso Pulido
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Eva Maria Novoa
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| | - Julia Ponomarenko
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain. .,Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
16
|
Orabi B, Xie N, McConeghy B, Dong X, Chauve C, Hach F. Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing. Nucleic Acids Res 2022; 51:e11. [PMID: 36478271 PMCID: PMC9881145 DOI: 10.1093/nar/gkac1112] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/26/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem-the Minimum Error Clustering into Isoforms (MErCi) problem-and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.
Collapse
Affiliation(s)
- Baraa Orabi
- Department of Computer Science, the University of British Columbia, Vancouver, BC, Canada
| | - Ning Xie
- Vancouver Prostate Centre, Vancouver, BC, Canada
| | | | - Xuesen Dong
- Vancouver Prostate Centre, Vancouver, BC, Canada,Department of Urologic Sciences, the University of British Columbia, Vancouver, BC, Canada
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Faraz Hach
- To whom correspondence should be addressed.
| |
Collapse
|
17
|
Walter M, Puniamoorthy N. Discovering novel reproductive genes in a non-model fly using de novo GridION transcriptomics. Front Genet 2022; 13:1003771. [PMID: 36568389 PMCID: PMC9768217 DOI: 10.3389/fgene.2022.1003771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/16/2022] [Indexed: 12/12/2022] Open
Abstract
Gene discovery has important implications for investigating phenotypic trait evolution, adaptation, and speciation. Male reproductive tissues, such as accessory glands (AGs), are hotspots for recruitment of novel genes that diverge rapidly even among closely related species/populations. These genes synthesize seminal fluid proteins that often affect post-copulatory sexual selection-they can mediate male-male sperm competition, ejaculate-female interactions that modify female remating and even influence reproductive incompatibilities among diverging species/populations. Although de novo transcriptomics has facilitated gene discovery in non-model organisms, reproductive gene discovery is still challenging without a reference database as they are often novel and bear no homology to known proteins. Here, we use reference-free GridION long-read transcriptomics, from Oxford Nanopore Technologies (ONT), to discover novel AG genes and characterize their expression in the widespread dung fly, Sepsis punctum. Despite stark population differences in male reproductive traits (e.g.: Body size, testes size, and sperm length) as well as female re-mating, the male AG genes and their secretions of S. punctum are still unknown. We implement a de novo ONT transcriptome pipeline incorporating quality-filtering and rigorous error-correction procedures, and we evaluate gene sequence and gene expression results against high-quality Illumina short-read data. We discover highly-expressed reproductive genes in AG transcriptomes of S. punctum consisting of 40 high-quality and high-confidence ONT genes that cross-verify against Illumina genes, among which 26 are novel and specific to S. punctum. Novel genes account for an average of 81% of total gene expression and may be functionally relevant in seminal fluid protein production. For instance, 80% of genes encoding secretory proteins account for 74% total gene expression. In addition, median sequence similarities of ONT nucleotide and protein sequences match within-Illumina sequence similarities. Read-count based expression quantification in ONT is congruent with Illumina's Transcript per Million (TPM), both in overall pattern and within functional categories. Rapid genomic innovation followed by recruitment of de novo genes for high expression in S. punctum AG tissue, a pattern observed in other insects, could be a likely mechanism of evolution of these genes. The study also demonstrates the feasibility of adapting ONT transcriptomics for gene discovery in non-model systems.
Collapse
|
18
|
Bonenfant Q, Noé L, Touzet H. Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. BIOINFORMATICS ADVANCES 2022; 3:vbac085. [PMID: 36698762 PMCID: PMC9869717 DOI: 10.1093/bioadv/vbac085] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/07/2022] [Indexed: 11/23/2022]
Abstract
Motivation Oxford Nanopore Technologies (ONT) sequencing has become very popular over the past few years and offers a cost-effective solution for many genomic and transcriptomic projects. One distinctive feature of the technology is that the protocol includes the ligation of adapters to both ends of each fragment. Those adapters should then be removed before downstream analyses, either during the basecalling step or by explicit trimming. This basic task may be tricky when the definition of the adapter sequence is not well documented. Results We have developed a new method to scan a set of ONT reads to see if it contains adapters, without any prior knowledge on the sequence of the potential adapters, and then trim out those adapters. The algorithm is based on approximate k-mers and is able to discover adapter sequences based on their frequency alone. The method was successfully tested on a variety of ONT datasets with different flowcells, sequencing kits and basecallers. Availability and implementation The resulting software, named Porechop_ABI, is open-source and is available at https://github.com/bonsai-team/Porechop_ABI. Supplementary information Supplementary data are available at Bioinformatics advances online.
Collapse
Affiliation(s)
- Quentin Bonenfant
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL—Centre de Recherche en Informatique Signal et Automatique de Lille, Lille F-59000, France
| | - Laurent Noé
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL—Centre de Recherche en Informatique Signal et Automatique de Lille, Lille F-59000, France
| | | |
Collapse
|
19
|
Yap M, O’Sullivan O, O’Toole PW, Cotter PD. Development of sequencing-based methodologies to distinguish viable from non-viable cells in a bovine milk matrix: A pilot study. Front Microbiol 2022; 13:1036643. [PMID: 36466696 PMCID: PMC9713316 DOI: 10.3389/fmicb.2022.1036643] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Accepted: 10/28/2022] [Indexed: 04/22/2024] Open
Abstract
Although high-throughput DNA sequencing-based methods have been of great value for determining the composition of microbial communities in various environments, there is the potential for inaccuracies arising from the sequencing of DNA from dead microorganisms. In this pilot study, we compared different sequencing-based methods to assess their relative accuracy with respect to distinguishing between viable and non-viable cells, using a live and heat-inactivated model community spiked into bovine milk. The methods used were shotgun metagenomics with and without propidium monoazide (PMA) treatment, RNA-based 16S rRNA sequencing and metatranscriptomics. The results showed that methods were generally accurate, though significant differences were found depending on the library types and sequencing technologies. Different molecular targets were the basis for variations in the results generated using different library types, while differences in the derived composition data from Oxford Nanopore Technologies-and Illumina-based sequencing likely reflect a combination of different sequencing depths, error rates and bioinformatics pipelines. Although PMA was successfully applied in this study, further optimisation is required before it can be applied in a more universal context for complex microbiomes. Overall, these methods show promise and represent another important step towards the ultimate establishment of approaches that can be applied to accurately identify live microorganisms in milk and other food niches.
Collapse
Affiliation(s)
- Min Yap
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
| | - Orla O’Sullivan
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Paul W. O’Toole
- School of Microbiology, University College Cork, Cork, Ireland
- APC Microbiome Ireland, Cork, Ireland
| | - Paul D. Cotter
- Teagasc Food Research Centre, Moorepark, Fermoy, Ireland
- APC Microbiome Ireland, Cork, Ireland
| |
Collapse
|
20
|
Bayega A, Oikonomopoulos S, Wang YC, Ragoussis J. Improved Nanopore full-length cDNA sequencing by PCR-suppression. Front Genet 2022; 13:1031355. [PMID: 36324505 PMCID: PMC9618600 DOI: 10.3389/fgene.2022.1031355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 09/30/2022] [Indexed: 11/29/2022] Open
Abstract
Full-length transcript sequencing remains a main goal of RNA sequencing. However, even the application of long-read sequencing technologies such as Oxford Nanopore Technologies still fail to yield full-length transcript sequencing for a significant portion of sequenced reads. Since these technologies can sequence reads that are far longer than the longest known processed transcripts, the lack of efficiency to obtain full-length transcripts from good quality RNAs stems from library preparation inefficiency rather than the presence of degraded RNA molecules. It has previously been shown that addition of inverted terminal repeats in cDNA during reverse transcription followed by single-primer PCR creates a PCR suppression effect that prevents amplification of short molecules thus enriching the library for longer transcripts. We adapted this method for Nanopore cDNA library preparation and show that not only is PCR efficiency increased but gene body coverage is dramatically improved. The results show that implementation of this simple strategy will result in better quality full-length RNA sequencing data and make full-length transcript sequencing possible for most of sequenced reads.
Collapse
Affiliation(s)
- Anthony Bayega
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Spyros Oikonomopoulos
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Yu Chang Wang
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
- Department of Bioengineering, McGill University, Montréal, QC, Canada
| |
Collapse
|
21
|
Tung KF, Lin WC. TEx-MST: tissue expression profiles of MANE select transcripts. Database (Oxford) 2022; 2022:6726258. [PMID: 36170113 PMCID: PMC9518666 DOI: 10.1093/database/baac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 12/05/2022]
Abstract
Recently, a new reference transcript dataset [Matched Annotation from the NCBI and EMBL-EBI (MANE) select] was released by NCBI and EMBL-EBI to make available a new unified representative transcript for human protein-coding genes. While the main purpose of MANE project is to provide a harmonized gene and transcript information standard, there is no explicit tissue expression information about these MANE select transcripts. In this report, we tried to provide useful expression profiles of MANE select transcripts in various normal human tissues to allow further interrogation of their molecular modulations and functional significance. We obtained the new V9 transcript expression dataset from the Genotype-Tissue Expression (GTEx) web portal. This new GTEx dataset, based on a long-read sequencing platform, affords better assessment of the expression of alternative spliced transcripts. This tissue expression profiles of MANE select transcripts (TEx-MST) database not only provides the basic information of MANE select transcripts but also tissue expression profiles on alternative transcripts in protein-coding genes. Users can initiate the interrogation by gene symbol searches or by browsing the MANE genes with various criteria (such as genome locations or expression rankings). We further utilized the GENCODE biotype feature to identify the top-ranked protein-coding transcripts by choosing the most expressed protein-coding transcripts from GTEx datasets (both V8 and V9 datasets). In summary, there are 18 083 genes matched between MANE and GTEx. Among them, 13 245 MANE select transcripts matched with the top-ranked protein-coding transcripts in GTEx V9 dataset, which underlined the dominate expression of MANE select transcripts. This TEx-MST web bioinformatic database provides a visualized user interface for the normal tissue expression patterns of MANE select transcripts using the newly released GTEx dataset. Database URL: TEx-MST is available at https://texmst.ibms.sinica.edu.tw/
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
| | - Wen-chang Lin
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
- Institute of Biomedical Informatics, National Yang-Ming Chiao Tung University , Taipei 112, Taiwan, R.O.C
| |
Collapse
|
22
|
Leshkowitz D, Kedmi M, Fried Y, Pilzer D, Keren-Shaul H, Ainbinder E, Dassa B. Exploring differential exon usage via short- and long-read RNA sequencing strategies. Open Biol 2022; 12:220206. [PMID: 36168804 PMCID: PMC9516339 DOI: 10.1098/rsob.220206] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing produces various mRNAs, and thereby various protein products, from one gene, impacting a wide range of cellular activities. However, accurate reconstruction and quantification of full-length transcripts using short-reads is limited, due to their length. Long-reads sequencing technologies may provide a solution by sequencing full-length transcripts. We explored the use of both Illumina short-reads and two long Oxford Nanopore Technology (cDNA and Direct RNA) RNA-Seq reads for detecting global differential splicing during mouse embryonic stem cell differentiation, applying several bioinformatics strategies: gene-based, isoform-based and exon-based. We detected the strongest similarity among the sequencing platforms at the gene level compared to exon-based and isoform-based. Furthermore, the exon-based strategy discovered many differential exon usage (DEU) events, mostly in a platform-dependent manner and in non-differentially expressed genes. Thus, the platforms complemented each other in the ability to detect DEUs (i.e. long-reads exhibited an advantage in detecting DEUs at the UTRs, and short-reads detected more DEUs). Exons within 20 genes, detected in one or more platforms, were here validated by PCR, including key differentiation genes, such as Mdb3 and Aplp1. We provide an important analysis resource for discovering transcriptome changes during stem cell differentiation and insights for analysing such data.
Collapse
Affiliation(s)
- Dena Leshkowitz
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Merav Kedmi
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Yael Fried
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - David Pilzer
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Hadas Keren-Shaul
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Elena Ainbinder
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Bareket Dassa
- Life Sciences Core Facilities, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
23
|
de la Rubia I, Srivastava A, Xue W, Indi JA, Carbonell-Sala S, Lagarde J, Albà MM, Eyras E. RATTLE: reference-free reconstruction and quantification of transcriptomes from Nanopore sequencing. Genome Biol 2022; 23:153. [PMID: 35804393 PMCID: PMC9264490 DOI: 10.1186/s13059-022-02715-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/20/2022] [Indexed: 11/04/2022] Open
Abstract
Nanopore sequencing enables the efficient and unbiased measurement of transcriptomes. Current methods for transcript identification and quantification rely on mapping reads to a reference genome, which precludes the study of species with a partial or missing reference or the identification of disease-specific transcripts not readily identifiable from a reference. We present RATTLE, a tool to perform reference-free reconstruction and quantification of transcripts using only Nanopore reads. Using simulated data and experimental data from isoform spike-ins, human tissues, and cell lines, we show that RATTLE accurately determines transcript sequences and their abundances, and shows good scalability with the number of transcripts.
Collapse
Affiliation(s)
- Ivan de la Rubia
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Pompeu Fabra University (UPF), E08003, Barcelona, Spain
| | - Akanksha Srivastava
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Australian National University, Acton, Canberra, ACT, 2601, Australia
| | - Wenjing Xue
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Australian National University, Acton, Canberra, ACT, 2601, Australia
| | - Joel A Indi
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia.,Universidade de Lisboa, Lisboa, Portugal
| | - Silvia Carbonell-Sala
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain.,Centre for Regulatory Genomics (CRG), E08001, Barcelona, Spain
| | - Julien Lagarde
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain.,Centre for Regulatory Genomics (CRG), E08001, Barcelona, Spain
| | - M Mar Albà
- Pompeu Fabra University (UPF), E08003, Barcelona, Spain. .,Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain. .,Hospital del Mar Medical Research Institute (IMIM), E08001, Barcelona, Spain.
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Acton, Canberra, ACT, 2601, Australia. .,Australian National University, Acton, Canberra, ACT, 2601, Australia. .,Catalan Institution for Research and Advanced Studies (ICREA), E08010, Barcelona, Spain. .,Hospital del Mar Medical Research Institute (IMIM), E08001, Barcelona, Spain.
| |
Collapse
|
24
|
Tombácz D, Kakuk B, Torma G, Csabai Z, Gulyás G, Tamás V, Zádori Z, Jefferson VA, Meyer F, Boldogkői Z. In-Depth Temporal Transcriptome Profiling of an Alphaherpesvirus Using Nanopore Sequencing. Viruses 2022; 14:v14061289. [PMID: 35746760 PMCID: PMC9229804 DOI: 10.3390/v14061289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 06/05/2022] [Accepted: 06/08/2022] [Indexed: 12/10/2022] Open
Abstract
In this work, a long-read sequencing (LRS) technique based on the Oxford Nanopore Technology MinION platform was used for quantifying and kinetic characterization of the poly(A) fraction of bovine alphaherpesvirus type 1 (BoHV-1) lytic transcriptome across a 12-h infection period. Amplification-based LRS techniques frequently generate artefactual transcription reads and are biased towards the production of shorter amplicons. To avoid these undesired effects, we applied direct cDNA sequencing, an amplification-free technique. Here, we show that a single promoter can produce multiple transcription start sites whose distribution patterns differ among the viral genes but are similar in the same gene at different timepoints. Our investigations revealed that the circ gene is expressed with immediate–early (IE) kinetics by utilizing a special mechanism based on the use of the promoter of another IE gene (bicp4) for the transcriptional control. Furthermore, we detected an overlap between the initiation of DNA replication and the transcription from the bicp22 gene, which suggests an interaction between the two molecular machineries. This study developed a generally applicable LRS-based method for the time-course characterization of transcriptomes of any organism.
Collapse
Affiliation(s)
- Dóra Tombácz
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Somogyi u. 4, 6720 Szeged, Hungary; (D.T.); (B.K.); (G.T.); (Z.C.); (G.G.)
| | - Balázs Kakuk
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Somogyi u. 4, 6720 Szeged, Hungary; (D.T.); (B.K.); (G.T.); (Z.C.); (G.G.)
| | - Gábor Torma
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Somogyi u. 4, 6720 Szeged, Hungary; (D.T.); (B.K.); (G.T.); (Z.C.); (G.G.)
| | - Zsolt Csabai
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Somogyi u. 4, 6720 Szeged, Hungary; (D.T.); (B.K.); (G.T.); (Z.C.); (G.G.)
| | - Gábor Gulyás
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Somogyi u. 4, 6720 Szeged, Hungary; (D.T.); (B.K.); (G.T.); (Z.C.); (G.G.)
| | - Vivien Tamás
- Institute for Veterinary Medical Research, Centre for Agricultural Research, Hungária krt. 21, 1143 Budapest, Hungary; (V.T.); (Z.Z.)
| | - Zoltán Zádori
- Institute for Veterinary Medical Research, Centre for Agricultural Research, Hungária krt. 21, 1143 Budapest, Hungary; (V.T.); (Z.Z.)
| | - Victoria A. Jefferson
- Department of Biochemistry & Molecular Biology, Entomology & Plant Pathology, Mississippi State University, 408 Dorman P.O. Box 9655, 32 Creelman St., Starkville, MS 39762, USA; (V.A.J.); (F.M.)
| | - Florencia Meyer
- Department of Biochemistry & Molecular Biology, Entomology & Plant Pathology, Mississippi State University, 408 Dorman P.O. Box 9655, 32 Creelman St., Starkville, MS 39762, USA; (V.A.J.); (F.M.)
| | - Zsolt Boldogkői
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Somogyi u. 4, 6720 Szeged, Hungary; (D.T.); (B.K.); (G.T.); (Z.C.); (G.G.)
- Correspondence:
| |
Collapse
|
25
|
Wu C, Lu X, Lu S, Wang H, Li D, Zhao J, Jin J, Sun Z, He QY, Chen Y, Zhang G. Efficient Detection of the Alternative Spliced Human Proteome Using Translatome Sequencing. Front Mol Biosci 2022; 9:895746. [PMID: 35720116 PMCID: PMC9201276 DOI: 10.3389/fmolb.2022.895746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 04/28/2022] [Indexed: 01/08/2023] Open
Abstract
Alternative splicing (AS) isoforms create numerous proteoforms, expanding the complexity of the genome. Highly similar sequences, incomplete reference databases and the insufficient sequence coverage of mass spectrometry limit the identification of AS proteoforms. Here, we demonstrated full-length translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) sequencing (RNC-seq) strategy to sequence the entire translating mRNA using next-generation sequencing, including short-read and long-read technologies, to construct a protein database containing all translating AS isoforms. Taking the advantage of read length, short-read RNC-seq identified up to 15,289 genes and 15,906 AS isoforms in a single human cell line, much more than the Ribo-seq. The single-molecule long-read RNC-seq supplemented 4,429 annotated AS isoforms that were not identified by short-read datasets, and 4,525 novel AS isoforms that were not included in the public databases. Using such RNC-seq-guided database, we identified 6,766 annotated protein isoforms and 50 novel protein isoforms in mass spectrometry datasets. These results demonstrated the potential of full-length RNC-seq in investigating the proteome of AS isoforms.
Collapse
Affiliation(s)
- Chun Wu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Xiaolong Lu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Shaohua Lu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
- State Key Laboratory of Respiratory Disease, School of Basic Medical Sciences, Sino-French Hoffmann Institute, Guangzhou Medical University, Guangzhou, China
| | - Hongwei Wang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Dehua Li
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Jing Zhao
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Jingjie Jin
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Zhenghua Sun
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Yang Chen
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| |
Collapse
|
26
|
LncRNA Biomarkers of Inflammation and Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1363:121-145. [PMID: 35220568 DOI: 10.1007/978-3-030-92034-0_7] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Long noncoding RNAs (lncRNAs) are promising candidates as biomarkers of inflammation and cancer. LncRNAs have several properties that make them well-suited as molecular markers of disease: (1) many lncRNAs are expressed in a tissue-specific manner, (2) distinct lncRNAs are upregulated based on different inflammatory or oncogenic stimuli, (3) lncRNAs released from cells are packaged and protected in extracellular vesicles, and (4) circulating lncRNAs in the blood are detectable using various RNA sequencing approaches. Here we focus on the potential for lncRNA biomarkers to detect inflammation and cancer, highlighting key biological, technological, and analytical considerations that will help advance the development of lncRNA-based liquid biopsies.
Collapse
|
27
|
Wan Y, Zong C, Li X, Wang A, Li Y, Yang T, Bao Q, Dubow M, Yang M, Rodrigo LA, Mao C. New Insights for Biosensing: Lessons from Microbial Defense Systems. Chem Rev 2022; 122:8126-8180. [PMID: 35234463 DOI: 10.1021/acs.chemrev.1c01063] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Microorganisms have gained defense systems during the lengthy process of evolution over millions of years. Such defense systems can protect them from being attacked by invading species (e.g., CRISPR-Cas for establishing adaptive immune systems and nanopore-forming toxins as virulence factors) or enable them to adapt to different conditions (e.g., gas vesicles for achieving buoyancy control). These microorganism defense systems (MDS) have inspired the development of biosensors that have received much attention in a wide range of fields including life science research, food safety, and medical diagnosis. This Review comprehensively analyzes biosensing platforms originating from MDS for sensing and imaging biological analytes. We first describe a basic overview of MDS and MDS-inspired biosensing platforms (e.g., CRISPR-Cas systems, nanopore-forming proteins, and gas vesicles), followed by a critical discussion of their functions and properties. We then discuss several transduction mechanisms (optical, acoustic, magnetic, and electrical) involved in MDS-inspired biosensing. We further detail the applications of the MDS-inspired biosensors to detect a variety of analytes (nucleic acids, peptides, proteins, pathogens, cells, small molecules, and metal ions). In the end, we propose the key challenges and future perspectives in seeking new and improved MDS tools that can potentially lead to breakthrough discoveries in developing a new generation of biosensors with a combination of low cost; high sensitivity, accuracy, and precision; and fast detection. Overall, this Review gives a historical review of MDS, elucidates the principles of emulating MDS to develop biosensors, and analyzes the recent advancements, current challenges, and future trends in this field. It provides a unique critical analysis of emulating MDS to develop robust biosensors and discusses the design of such biosensors using elements found in MDS, showing that emulating MDS is a promising approach to conceptually advancing the design of biosensors.
Collapse
Affiliation(s)
- Yi Wan
- State Key Laboratory of Marine Resource Utilization in the South China Sea, School of Pharmaceutical Sciences, Marine College, Hainan University, Haikou 570228, P. R. China
| | - Chengli Zong
- State Key Laboratory of Marine Resource Utilization in the South China Sea, School of Pharmaceutical Sciences, Marine College, Hainan University, Haikou 570228, P. R. China
| | - Xiangpeng Li
- Department of Bioengineering and Therapeutic Sciences, Schools of Medicine and Pharmacy, University of California, San Francisco, 1700 Fourth Street, Byers Hall 303C, San Francisco, California 94158, United States
| | - Aimin Wang
- State Key Laboratory of Marine Resource Utilization in the South China Sea, School of Pharmaceutical Sciences, Marine College, Hainan University, Haikou 570228, P. R. China
| | - Yan Li
- College of Animal Science, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Tao Yang
- School of Materials Science and Engineering, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Qing Bao
- School of Materials Science and Engineering, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Michael Dubow
- Institute for Integrative Biology of the Cell (I2BC), UMR 9198 CNRS, CEA, Université Paris-Saclay, Campus C.N.R.S, Bâtiment 12, Avenue de la Terrasse, 91190 Gif-sur-Yvette, France
| | - Mingying Yang
- College of Animal Science, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Ledesma-Amaro Rodrigo
- Imperial College Centre for Synthetic Biology, Department of Bioengineering, Imperial College London, London SW7 2AZ, United Kingdom
| | - Chuanbin Mao
- Department of Chemistry & Biochemistry, Stephenson Life Science Research Center, University of Oklahoma, 101 Stephenson Parkway, Norman, Oklahoma 73019, United States.,School of Materials Science and Engineering, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| |
Collapse
|
28
|
Grünberger F, Ferreira-Cerca S, Grohmann D. Nanopore sequencing of RNA and cDNA molecules in Escherichia coli. RNA (NEW YORK, N.Y.) 2022; 28:400-417. [PMID: 34906997 PMCID: PMC8848933 DOI: 10.1261/rna.078937.121] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 11/29/2021] [Indexed: 05/09/2023]
Abstract
High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely used to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that (PCR)-cDNA-seq offers improved yield and accuracy compared to direct RNA sequencing. Notably, (PCR)-cDNA-seq is suitable for quantitative measurements and can be readily used for simultaneous and accurate detection of transcript 5' and 3' boundaries, analysis of transcriptional units, and transcriptional heterogeneity. In summary, based on our comprehensive study, we show nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features. Thereby nanopore RNA-seq holds the potential to become a valuable alternative method for RNA analysis in prokaryotes.
Collapse
Affiliation(s)
- Felix Grünberger
- Institute of Biochemistry, Genetics and Microbiology, Institute of Microbiology and Archaea Centre, Single-Molecule Biochemistry Lab and Biochemistry Centre Regensburg, University of Regensburg, 93053 Regensburg, Germany
| | - Sébastien Ferreira-Cerca
- Regensburg Center of Biochemistry (RCB), University of Regensburg, 93053 Regensburg, Germany
- Institute for Biochemistry, Genetics and Microbiology, Regensburg Center for Biochemistry, Biochemistry III, University of Regensburg, 93053 Regensburg, Germany
| | - Dina Grohmann
- Institute of Biochemistry, Genetics and Microbiology, Institute of Microbiology and Archaea Centre, Single-Molecule Biochemistry Lab and Biochemistry Centre Regensburg, University of Regensburg, 93053 Regensburg, Germany
- Regensburg Center of Biochemistry (RCB), University of Regensburg, 93053 Regensburg, Germany
| |
Collapse
|
29
|
Gleeson J, Leger A, Prawer YDJ, Lane TA, Harrison PJ, Haerty W, Clark MB. Accurate expression quantification from nanopore direct RNA sequencing with NanoCount. Nucleic Acids Res 2022; 50:e19. [PMID: 34850115 PMCID: PMC8886870 DOI: 10.1093/nar/gkab1129] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 10/23/2021] [Accepted: 10/27/2021] [Indexed: 11/13/2022] Open
Abstract
Accurately quantifying gene and isoform expression changes is essential to understanding cell functions, differentiation and disease. Sequencing full-length native RNAs using long-read direct RNA sequencing (DRS) has the potential to overcome many limitations of short and long-read sequencing methods that require RNA fragmentation, cDNA synthesis or PCR. However, there are a lack of tools specifically designed for DRS and its ability to identify differential expression in complex organisms is poorly characterised. We developed NanoCount for fast, accurate transcript isoform quantification in DRS and demonstrate it outperforms similar methods. Using synthetic controls and human SH-SY5Y cell differentiation into neuron-like cells, we show that DRS accurately quantifies RNA expression and identifies differential expression of genes and isoforms. Differential expression of 231 genes, 333 isoforms, plus 27 isoform switches were detected between undifferentiated and differentiated SH-SY5Y cells and samples clustered by differentiation state at the gene and isoform level. Genes upregulated in neuron-like cells were associated with neurogenesis. NanoCount quantification of thousands of novel isoforms discovered with DRS likewise enabled identification of their differential expression. Our results demonstrate enhanced DRS isoform quantification with NanoCount and establish the ability of DRS to identify biologically relevant differential expression of genes and isoforms.
Collapse
Affiliation(s)
- Josie Gleeson
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
| | - Adrien Leger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yair D J Prawer
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
| | - Tracy A Lane
- Department of Psychiatry, University of Oxford, Oxford, UK
| | - Paul J Harrison
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Health NHS Foundation Trust, Oxford, UK
| | - Wilfried Haerty
- The Earlham Institute, Norwich, UK
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Michael B Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
- Department of Psychiatry, University of Oxford, Oxford, UK
| |
Collapse
|
30
|
Fiszbein A, McGurk M, Calvo-Roitberg E, Kim G, Burge CB, Pai AA. Widespread occurrence of hybrid internal-terminal exons in human transcriptomes. SCIENCE ADVANCES 2022; 8:eabk1752. [PMID: 35044812 PMCID: PMC8769537 DOI: 10.1126/sciadv.abk1752] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Accepted: 11/23/2021] [Indexed: 06/12/2023]
Abstract
Messenger RNA isoform differences are predominantly driven by alternative first, internal, and last exons. Despite the importance of classifying exons to understand isoform structure, few tools examine isoform-specific exon usage. We recently observed that alternative transcription start sites often arise near internal exons, often creating “hybrid” first/internal exons. To systematically detect hybrid exons, we built the hybrid-internal-terminal (HIT) pipeline to classify exons depending on their isoform-specific usage. On the basis of splice junction reads in RNA sequencing data and probabilistic modeling, the HIT index identified thousands of previously misclassified hybrid first-internal and internal-last exons. Hybrid exons are enriched in long genes and genes involved in RNA splicing and have longer flanking introns and strong splice sites. Their usage varies considerably across human tissues. By developing the first method to classify exons according to isoform contexts, our findings document the occurrence of hybrid exons, a common quirk of the human transcriptome.
Collapse
Affiliation(s)
- Ana Fiszbein
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biology, Boston University, Boston, MA, USA
| | - Michael McGurk
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - GyeungYun Kim
- Department of Biology, Boston University, Boston, MA, USA
| | - Christopher B. Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Athma A. Pai
- RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, MA, USA
| |
Collapse
|
31
|
Wright DJ, Hall NAL, Irish N, Man AL, Glynn W, Mould A, Angeles ADL, Angiolini E, Swarbreck D, Gharbi K, Tunbridge EM, Haerty W. Long read sequencing reveals novel isoforms and insights into splicing regulation during cell state changes. BMC Genomics 2022; 23:42. [PMID: 35012468 PMCID: PMC8744310 DOI: 10.1186/s12864-021-08261-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 12/15/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Alternative splicing is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of alternative splicing processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line SH-SY5Y, and to characterise isoform expression and usage across differentiation. RESULTS We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation identifying candidates for future research into state change regulation. CONCLUSIONS Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.
Collapse
Affiliation(s)
- David J Wright
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Nicola A L Hall
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Naomi Irish
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Angela L Man
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Will Glynn
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Arne Mould
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Alejandro De Los Angeles
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Emily Angiolini
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - David Swarbreck
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Karim Gharbi
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK
| | - Elizabeth M Tunbridge
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxfordshire, OX3 3JX, UK
- Oxford Health, NHS Foundation Trust, Oxford, Oxfordshire, OX3 7JX, UK
| | - Wilfried Haerty
- Earlham Institute, Norwich Research Park, Norfolk, NR4 7UZ, UK.
| |
Collapse
|
32
|
Hamid F, Alasoo K, Vilo J, Makeyev E. Functional Annotation of Custom Transcriptomes. Methods Mol Biol 2022; 2537:149-172. [PMID: 35895263 DOI: 10.1007/978-1-0716-2521-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Many eukaryotic genes can give rise to different alternative transcripts depending on stage of development, cell type, and physiological cues. Current transcriptome-wide sequencing technologies highlight the remarkable extent of this regulation in metazoans and allow for RNA isoforms to be profiled in increasingly small biological samples and with a growing confidence. Understanding biological functions of sample-specific transcripts is a major challenge in genomics and RNA processing fields. Here we describe simple bioinformatics workflows that facilitate this task by streamlining reference-guided annotation of novel transcripts. A key part of our protocol is the R package factR that rapidly matches custom-assembled transcripts to their likely host genes, deduces the sequence and domain structure of novel protein products, and predicts sensitivity of newly identified RNA isoforms to nonsense-mediated decay.
Collapse
Affiliation(s)
- Fursham Hamid
- Centre for Developmental Neurobiology, King's College London, London, UK.
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Jaak Vilo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Eugene Makeyev
- Centre for Developmental Neurobiology, King's College London, London, UK.
| |
Collapse
|
33
|
Fang Y, Changavi A, Yang M, Sun L, Zhang A, Sun D, Sun Z, Zhang B, Xu M. Nanopore Whole Transcriptome Analysis and Pathogen Surveillance by a Novel Solid-Phase Catalysis Approach. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2103373. [PMID: 34837482 PMCID: PMC8787394 DOI: 10.1002/advs.202103373] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 10/28/2021] [Indexed: 06/13/2023]
Abstract
The requirement of a large input amount (500 ng) for Nanopore direct RNA-seq presents a major challenge for low input transcriptomic analysis and early pathogen surveillance. The high RNA input requirement is attributed to significant sample loss associated with library preparation using solid-phase reversible immobilization (SPRI) beads. A novel solid-phase catalysis strategy for RNA library preparation to circumvent the need for SPRI bead purification to remove enzymes is reported here. This new approach leverages concurrent processing of non-polyadenylated transcripts with immobilized poly(A) polymerase and T4 DNA ligase, followed by directly loading the prepared library onto a flow cell. Whole transcriptome sequencing, using a human pathogen Listeria monocytogenes as a model, demonstrates this new method displays little sample loss, takes much less time, and generates higher sequencing throughput correlated with reduced nanopore fouling compared to the current library preparation for 500 ng input. Consequently, this approach enables Nanopore low-input direct RNA-seq, improving pathogen detection and transcript identification in a microbial community standard with spike-in transcript controls. Besides, as evident in the bioinformatic analysis, the new method provides accurate RNA consensus with high fidelity and identifies higher numbers of expressed genes for both high and low input RNA amounts.
Collapse
Affiliation(s)
- Yi Fang
- New England Biolabs, Inc.IpswichMA01938USA
| | | | - Manyun Yang
- Department of Microbiology and ImmunologyCornell UniversityIthacaNY14853USA
| | - Luo Sun
- New England Biolabs, Inc.IpswichMA01938USA
| | | | - Daniel Sun
- New England Biolabs, Inc.IpswichMA01938USA
| | - Zhiyi Sun
- New England Biolabs, Inc.IpswichMA01938USA
| | - Boce Zhang
- Department of Food Science and Human NutritionUniversity of FloridaGainesvilleFL32603USA
| | | |
Collapse
|
34
|
Kuo MC, Liu SCH, Hsu YF, Wu RM. The role of noncoding RNAs in Parkinson's disease: biomarkers and associations with pathogenic pathways. J Biomed Sci 2021; 28:78. [PMID: 34794432 PMCID: PMC8603508 DOI: 10.1186/s12929-021-00775-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 11/04/2021] [Indexed: 02/08/2023] Open
Abstract
The discovery of various noncoding RNAs (ncRNAs) and their biological implications is a growing area in cell biology. Increasing evidence has revealed canonical and noncanonical functions of long and small ncRNAs, including microRNAs, long ncRNAs (lncRNAs), circular RNAs, PIWI-interacting RNAs, and tRNA-derived fragments. These ncRNAs have the ability to regulate gene expression and modify metabolic pathways. Thus, they may have important roles as diagnostic biomarkers or therapeutic targets in various diseases, including neurodegenerative disorders, especially Parkinson's disease. Recently, through diverse sequencing technologies and a wide variety of bioinformatic analytical tools, such as reverse transcriptase quantitative PCR, microarrays, next-generation sequencing and long-read sequencing, numerous ncRNAs have been shown to be associated with neurodegenerative disorders, including Parkinson's disease. In this review article, we will first introduce the biogenesis of different ncRNAs, including microRNAs, PIWI-interacting RNAs, circular RNAs, long noncoding RNAs, and tRNA-derived fragments. The pros and cons of the detection platforms of ncRNAs and the reproducibility of bioinformatic analytical tools will be discussed in the second part. Finally, the recent discovery of numerous PD-associated ncRNAs and their association with the diagnosis and pathophysiology of PD are reviewed, and microRNAs and long ncRNAs that are transported by exosomes in biofluids are particularly emphasized.
Collapse
Affiliation(s)
- Ming-Che Kuo
- Department of Medicine, Section of Neurology, Cancer Center, National Taiwan University Hospital, Taipei, Taiwan
- Department of Neurology, National Taiwan University Hospital, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Sam Chi-Hao Liu
- Department of Neurology, National Taiwan University Hospital, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Ya-Fang Hsu
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Ruey-Meei Wu
- Department of Neurology, National Taiwan University Hospital, College of Medicine, National Taiwan University, Taipei, Taiwan.
- Graduate Institute of Brain and Mind Sciences, College of Medicine, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
35
|
Lagarrigue S, Lorthiois M, Degalez F, Gilot D, Derrien T. LncRNAs in domesticated animals: from dog to livestock species. Mamm Genome 2021; 33:248-270. [PMID: 34773482 PMCID: PMC9114084 DOI: 10.1007/s00335-021-09928-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 10/19/2021] [Indexed: 11/29/2022]
Abstract
Animal genomes are pervasively transcribed into multiple RNA molecules, of which many will not be translated into proteins. One major component of this transcribed non-coding genome is the long non-coding RNAs (lncRNAs), which are defined as transcripts longer than 200 nucleotides with low coding-potential capabilities. Domestic animals constitute a unique resource for studying the genetic and epigenetic basis of phenotypic variations involving protein-coding and non-coding RNAs, such as lncRNAs. This review presents the current knowledge regarding transcriptome-based catalogues of lncRNAs in major domesticated animals (pets and livestock species), covering a broad phylogenetic scale (from dogs to chicken), and in comparison with human and mouse lncRNA catalogues. Furthermore, we describe different methods to extract known or discover novel lncRNAs and explore comparative genomics approaches to strengthen the annotation of lncRNAs. We then detail different strategies contributing to a better understanding of lncRNA functions, from genetic studies such as GWAS to molecular biology experiments and give some case examples in domestic animals. Finally, we discuss the limitations of current lncRNA annotations and suggest research directions to improve them and their functional characterisation.
Collapse
Affiliation(s)
| | - Matthias Lorthiois
- Univ Rennes, CNRS, IGDR (Institut de Génétique et Développement de Rennes) - UMR 6290, 2 av Prof Leon Bernard, F-35000, Rennes, France
| | - Fabien Degalez
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, 35590, Saint-Gilles, France
| | - David Gilot
- CLCC Eugène Marquis, INSERM, Université Rennes, UMR_S 1242, 35000, Rennes, France
| | - Thomas Derrien
- Univ Rennes, CNRS, IGDR (Institut de Génétique et Développement de Rennes) - UMR 6290, 2 av Prof Leon Bernard, F-35000, Rennes, France.
| |
Collapse
|
36
|
Pyatnitskiy MA, Arzumanian VA, Radko SP, Ptitsyn KG, Vakhrushev IV, Poverennaya EV, Ponomarenko EA. Oxford Nanopore MinION Direct RNA-Seq for Systems Biology. BIOLOGY 2021; 10:1131. [PMID: 34827124 PMCID: PMC8615092 DOI: 10.3390/biology10111131] [Citation(s) in RCA: 146] [Impact Index Per Article: 48.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 10/28/2021] [Accepted: 11/02/2021] [Indexed: 12/14/2022]
Abstract
Long-read direct RNA sequencing developed by Oxford Nanopore Technologies (ONT) is quickly gaining popularity for transcriptome studies, while fast turnaround time and low cost make it an attractive instrument for clinical applications. There is a growing interest to utilize transcriptome data to unravel activated biological processes responsible for disease progression and response to therapies. This trend is of particular interest for precision medicine which aims at single-patient analysis. Here we evaluated whether gene abundances measured by MinION direct RNA sequencing are suited to produce robust estimates of pathway activation for single sample scoring methods. We performed multiple RNA-seq analyses for a single sample that originated from the HepG2 cell line, namely five ONT replicates, and three replicates using Illumina NovaSeq. Two pathway scoring methods were employed-ssGSEA and singscore. We estimated the ONT performance in terms of detected protein-coding genes and average pairwise correlation between pathway activation scores using an exhaustive computational scheme for all combinations of replicates. In brief, we found that at least two ONT replicates are required to obtain reproducible pathway scores for both algorithms. We hope that our findings may be of interest to researchers planning their ONT direct RNA-seq experiments.
Collapse
Affiliation(s)
- Mikhail A. Pyatnitskiy
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
- Federal Research and Clinical Center of Physical-Chemical Medicine, 119435 Moscow, Russia
| | - Viktoriia A. Arzumanian
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
| | - Sergey P. Radko
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
| | - Konstantin G. Ptitsyn
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
| | - Igor V. Vakhrushev
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
| | - Ekaterina V. Poverennaya
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
| | - Elena A. Ponomarenko
- Institute of Biomedical Chemistry, 119121 Moscow, Russia; (V.A.A.); (S.P.R.); (K.G.P.); (I.V.V.); (E.V.P.); (E.A.P.)
| |
Collapse
|
37
|
Ibrahim F, Oppelt J, Maragkakis M, Mourelatos Z. TERA-Seq: true end-to-end sequencing of native RNA molecules for transcriptome characterization. Nucleic Acids Res 2021; 49:e115. [PMID: 34428294 PMCID: PMC8599856 DOI: 10.1093/nar/gkab713] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 07/31/2021] [Accepted: 08/18/2021] [Indexed: 11/14/2022] Open
Abstract
Direct sequencing of single, native RNA molecules through nanopores has a strong potential to transform research in all aspects of RNA biology and clinical diagnostics. The existing platform from Oxford Nanopore Technologies is unable to sequence the very 5′ ends of RNAs and is limited to polyadenylated molecules. Here, we develop True End-to-end RNA Sequencing (TERA-Seq), a platform that addresses these limitations, permitting more thorough transcriptome characterization. TERA-Seq describes both poly- and non-polyadenylated RNA molecules and accurately identifies their native 5′ and 3′ ends by ligating uniquely designed adapters that are sequenced along with the transcript. We find that capped, full-length mRNAs in human cells show marked variation of poly(A) tail lengths at the single molecule level. We report prevalent capping downstream of canonical transcriptional start sites in otherwise fully spliced and polyadenylated molecules. We reveal RNA processing and decay at single molecule level and find that mRNAs decay cotranslationally, often from their 5′ ends, while frequently retaining poly(A) tails. TERA-Seq will prove useful in many applications where true end-to-end direct sequencing of single, native RNA molecules and their isoforms is desirable.
Collapse
Affiliation(s)
- Fadia Ibrahim
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.,Department of Biochemistry and Molecular Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Jan Oppelt
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Manolis Maragkakis
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD 21224, USA
| | - Zissimos Mourelatos
- Department of Pathology and Laboratory Medicine, Division of Neuropathology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
38
|
Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon. Life (Basel) 2021; 11:life11080862. [PMID: 34440606 PMCID: PMC8399832 DOI: 10.3390/life11080862] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/07/2021] [Accepted: 08/17/2021] [Indexed: 12/16/2022] Open
Abstract
With the advantages that long-read sequencing platforms such as Pacific Biosciences (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, thus there is a need to compare these long-read sequencing platforms and evaluate them for specific research questions. This study aims to compare the performance of PacBio and ONT platforms for transcriptomic analysis by utilizing transcriptome data from three different tissues (hepatopancreas, intestine, and gonads) of the juvenile black tiger shrimp, Penaeus monodon. We compared three important features: (i) main characteristics of the sequencing libraries and their alignment with the reference genome, (ii) transcript assembly features and isoform identification, and (iii) correlation of the quantification of gene expression levels for both platforms. Our analyses suggest that read-length bias and differences in sequencing throughput are highly influential factors when using long reads in transcriptome studies. These comparisons can provide a guideline when designing a transcriptome study utilizing these two long-read sequencing technologies.
Collapse
|
39
|
De Paoli-Iseppi R, Gleeson J, Clark MB. Isoform Age - Splice Isoform Profiling Using Long-Read Technologies. Front Mol Biosci 2021; 8:711733. [PMID: 34409069 PMCID: PMC8364947 DOI: 10.3389/fmolb.2021.711733] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 07/19/2021] [Indexed: 01/12/2023] Open
Abstract
Alternative splicing (AS) of RNA is a key mechanism that results in the expression of multiple transcript isoforms from single genes and leads to an increase in the complexity of both the transcriptome and proteome. Regulation of AS is critical for the correct functioning of many biological pathways, while disruption of AS can be directly pathogenic in diseases such as cancer or cause risk for complex disorders. Current short-read sequencing technologies achieve high read depth but are limited in their ability to resolve complex isoforms. In this review we examine how long-read sequencing (LRS) technologies can address this challenge by covering the entire RNA sequence in a single read and thereby distinguish isoform changes that could impact RNA regulation or protein function. Coupling LRS with technologies such as single cell sequencing, targeted sequencing and spatial transcriptomics is producing a rapidly expanding suite of technological approaches to profile alternative splicing at the isoform level with unprecedented detail. In addition, integrating LRS with genotype now allows the impact of genetic variation on isoform expression to be determined. Recent results demonstrate the potential of these techniques to elucidate the landscape of splicing, including in tissues such as the brain where AS is particularly prevalent. Finally, we also discuss how AS can impact protein function, potentially leading to novel therapeutic targets for a range of diseases.
Collapse
Affiliation(s)
| | | | - Michael B. Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
40
|
Zhu C, Wu J, Sun H, Briganti F, Meder B, Wei W, Steinmetz LM. Single-molecule, full-length transcript isoform sequencing reveals disease-associated RNA isoforms in cardiomyocytes. Nat Commun 2021; 12:4203. [PMID: 34244519 PMCID: PMC8270901 DOI: 10.1038/s41467-021-24484-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/22/2021] [Indexed: 01/06/2023] Open
Abstract
Alternative splicing generates differing RNA isoforms that govern phenotypic complexity of eukaryotes. Its malfunction underlies many diseases, including cancer and cardiovascular diseases. Comparative analysis of RNA isoforms at the genome-wide scale has been difficult. Here, we establish an experimental and computational pipeline that performs de novo transcript annotation and accurately quantifies transcript isoforms from cDNA sequences with a full-length isoform detection accuracy of 97.6%. We generate a searchable, quantitative human transcriptome annotation with 31,025 known and 5,740 novel transcript isoforms ( http://steinmetzlab.embl.de/iBrowser/ ). By analyzing the isoforms in the presence of RNA Binding Motif Protein 20 (RBM20) mutations associated with aggressive dilated cardiomyopathy (DCM), we identify 121 differentially expressed transcript isoforms in 107 cardiac genes. Our approach enables quantitative dissection of complex transcript architecture instead of mere identification of inclusion or exclusion of individual exons, as exemplified by the discovery of IMMT isoforms mis-spliced by RBM20 mutations. Thereby we achieve a path to direct differential expression testing independent of an existing annotation of transcript isoforms, providing more immediate biological interpretation and higher resolution transcriptome comparisons.
Collapse
Affiliation(s)
- Chenchen Zhu
- Department of Genetics, School of Medicine, Stanford University, Stanford, USA
| | - Jingyan Wu
- Department of Genetics, School of Medicine, Stanford University, Stanford, USA
| | - Han Sun
- Department of Genetics, School of Medicine, Stanford University, Stanford, USA
| | - Francesca Briganti
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
- Cardiovascular Institute and Department of Medicine, Stanford University, Stanford, USA
- Collaboration for joint PhD degree between EMBL and Heidelberg University, Faculty of Biosciences, Heidelberg, Germany
| | - Benjamin Meder
- Department of Genetics, School of Medicine, Stanford University, Stanford, USA
- Institute for Cardiomyopathies Heidelberg (ICH), Heart Center Heidelberg, University of Heidelberg, Heidelberg, Germany
- DZHK (German Center for Cardiovascular Research), partner site Heidelberg, Heidelberg, Germany
- Department of Medicine III, University of Heidelberg, Heidelberg, Germany
| | - Wu Wei
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
- Center for Biomedical Informatics, Shanghai Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai, China.
- Stanford Genome Technology Center, Stanford University, Palo Alto, USA.
| | - Lars M Steinmetz
- Department of Genetics, School of Medicine, Stanford University, Stanford, USA.
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.
- Cardiovascular Institute and Department of Medicine, Stanford University, Stanford, USA.
- Stanford Genome Technology Center, Stanford University, Palo Alto, USA.
- DZHK (German Center for Cardiovascular Research), partner site EMBL Heidelberg, Heidelberg, Germany.
| |
Collapse
|
41
|
Haveman NJ, Khodadad CLM, Dixit AR, Louyakis AS, Massa GD, Venkateswaran K, Foster JS. Evaluating the lettuce metatranscriptome with MinION sequencing for future spaceflight food production applications. NPJ Microgravity 2021; 7:22. [PMID: 34140518 PMCID: PMC8211661 DOI: 10.1038/s41526-021-00151-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 06/03/2021] [Indexed: 02/05/2023] Open
Abstract
Healthy plants are vital for successful, long-duration missions in space, as they provide the crew with life support, food production, and psychological benefits. The microorganisms that associate with plant tissues play a critical role in improving plant health and production. To that end, we developed a methodology to investigate the transcriptional activities of the microbiome of red romaine lettuce, a key salad crop that was grown under International Space Station (ISS)-like conditions. Microbial transcripts enriched from host-microbe total RNA were sequenced using the Oxford Nanopore MinION sequencing platform. Results show that this enrichment approach was highly reproducible and could be an effective approach for the on-site detection of microbial transcriptional activity. Our results demonstrate the feasibility of using metatranscriptomics of enriched microbial RNA as a potential method for on-site monitoring of the transcriptional activity of crop microbiomes, thereby helping to facilitate and maintain plant health for on-orbit space food production.
Collapse
Affiliation(s)
- Natasha J. Haveman
- grid.15276.370000 0004 1936 8091Department of Microbiology and Cell Science, University of Florida, Space Life Science Lab, Merritt Island, FL USA
| | - Christina L. M. Khodadad
- grid.419743.c0000 0001 0845 4769Amentum Services, Inc., LASSO, Kennedy Space Center, Merritt Island, FL USA
| | - Anirudha R. Dixit
- grid.419743.c0000 0001 0845 4769Amentum Services, Inc., LASSO, Kennedy Space Center, Merritt Island, FL USA
| | - Artemis S. Louyakis
- grid.63054.340000 0001 0860 4915Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT USA
| | - Gioia D. Massa
- grid.419743.c0000 0001 0845 4769Space Crop Production Team, Kennedy Space Center, Merritt Island, FL USA
| | - Kasthuri Venkateswaran
- grid.211367.0Biotechnology and Planetary Protection Group, Jet Propulsion Laboratory, Pasadena, CA USA
| | - Jamie S. Foster
- grid.15276.370000 0004 1936 8091Department of Microbiology and Cell Science, University of Florida, Space Life Science Lab, Merritt Island, FL USA
| |
Collapse
|
42
|
Massaiu I, Songia P, Chiesa M, Valerio V, Moschetta D, Alfieri V, Myasoedova VA, Schmid M, Cassetta L, Colombo GI, D’Alessandra Y, Poggio P. Evaluation of Oxford Nanopore MinION RNA-Seq Performance for Human Primary Cells. Int J Mol Sci 2021; 22:ijms22126317. [PMID: 34204756 PMCID: PMC8231517 DOI: 10.3390/ijms22126317] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/17/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022] Open
Abstract
Transcript sequencing is a crucial tool for gaining a deep understanding of biological processes in diagnostic and clinical medicine. Given their potential to study novel complex eukaryotic transcriptomes, long-read sequencing technologies are able to overcome some limitations of short-read RNA-Seq approaches. Oxford Nanopore Technologies (ONT) offers the ability to generate long-read sequencing data in real time via portable protein nanopore USB devices. This work aimed to provide the user with the number of reads that should be sequenced, through the ONT MinION platform, to reach the desired accuracy level for a human cell RNA study. We sequenced three cDNA libraries prepared from poly-adenosine RNA of human primary cardiac fibroblasts. Since the runs were comparable, they were combined in a total dataset of 48 million reads. Synthetic datasets with different sizes were generated starting from the total and analyzed in terms of the number of identified genes and their expression levels. As expected, an improved sensitivity was obtained, increasing the sequencing depth, particularly for the non-coding genes. The reliability of expression levels was assayed by (i) comparison with PCR quantifications of selected genes and (ii) by the implementation of a user-friendly multiplexing method in a single run.
Collapse
Affiliation(s)
- Ilaria Massaiu
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Paola Songia
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Mattia Chiesa
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Vincenza Valerio
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
- Dipartimento di Medicina Clinica e Chirurgia, Università degli Studi di Napoli Federico II, 80131 Napoli, Italy
| | - Donato Moschetta
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
- Dipartimento di Scienze Farmacologiche e Biomolecolari, Università degli Studi di Milano, 20133 Milano, Italy
| | - Valentina Alfieri
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Veronika A. Myasoedova
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Michael Schmid
- Genexa AG, Dienerstrasse 7, CH-8004 Zürich, Switzerland;
| | - Luca Cassetta
- The Queen’s Medical Research Council Centre for Reproductive Health, University of Edinburgh, Edinburgh EH16 4TJ, UK;
| | - Gualtiero I. Colombo
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Yuri D’Alessandra
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
| | - Paolo Poggio
- Centro Cardiologico Monzino IRCCS, 20131 Milan, Italy; (I.M.); (P.S.); (M.C.); (V.V.); (D.M.); (V.A.); (V.A.M.); (G.I.C.); (Y.D.)
- Correspondence:
| |
Collapse
|
43
|
Halstead MM, Islas-Trejo A, Goszczynski DE, Medrano JF, Zhou H, Ross PJ. Large-Scale Multiplexing Permits Full-Length Transcriptome Annotation of 32 Bovine Tissues From a Single Nanopore Flow Cell. Front Genet 2021; 12:664260. [PMID: 34093657 PMCID: PMC8173071 DOI: 10.3389/fgene.2021.664260] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/06/2021] [Indexed: 12/18/2022] Open
Abstract
A comprehensive annotation of transcript isoforms in domesticated species is lacking. Especially considering that transcriptome complexity and splicing patterns are not well-conserved between species, this presents a substantial obstacle to genomic selection programs that seek to improve production, disease resistance, and reproduction. Recent advances in long-read sequencing technology have made it possible to directly extrapolate the structure of full-length transcripts without the need for transcript reconstruction. In this study, we demonstrate the power of long-read sequencing for transcriptome annotation by coupling Oxford Nanopore Technology (ONT) with large-scale multiplexing of 93 samples, comprising 32 tissues collected from adult male and female Hereford cattle. More than 30 million uniquely mapping full-length reads were obtained from a single ONT flow cell, and used to identify and characterize the expression dynamics of 99,044 transcript isoforms at 31,824 loci. Of these predicted transcripts, 21% exactly matched a reference transcript, and 61% were novel isoforms of reference genes, substantially increasing the ratio of transcript variants per gene, and suggesting that the complexity of the bovine transcriptome is comparable to that in humans. Over 7,000 transcript isoforms were extremely tissue-specific, and 61% of these were attributed to testis, which exhibited the most complex transcriptome of all interrogated tissues. Despite profiling over 30 tissues, transcription was only detected at about 60% of reference loci. Consequently, additional studies will be necessary to continue characterizing the bovine transcriptome in additional cell types, developmental stages, and physiological conditions. However, by here demonstrating the power of ONT sequencing coupled with large-scale multiplexing, the task of exhaustively annotating the bovine transcriptome - or any mammalian transcriptome - appears significantly more feasible.
Collapse
Affiliation(s)
| | | | | | | | | | - Pablo J. Ross
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|
44
|
BoardION: real-time monitoring of Oxford Nanopore sequencing instruments. BMC Bioinformatics 2021; 22:245. [PMID: 33985424 PMCID: PMC8120926 DOI: 10.1186/s12859-021-04161-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 05/04/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND One of the main advantages of the Oxford Nanopore Technology (ONT) is the possibility of real-time sequencing. This gives access to information during the experiment and allows either to control the sequencing or to stop the sequencing once the results have been obtained. However, the ONT sequencing interface is not sufficient to explore the quality of sequencing data in depth and existing quality control tools do not take full advantage of real-time data streaming. RESULTS Herein, we present BoardION, an interactive web application to analyze the efficiency of ONT sequencing runs. The interactive interface of BoardION allows users to easily explore sequencing metrics and optimize the quantity and the quality of the data generated during the experiment. It also enables the comparison of multiple flowcells to assess library preparation protocols or the quality of input samples. CONCLUSION BoardION is dedicated to people who manage ONT sequencing instruments and allows them to remotely and in real time monitor their experiments and compare multiple sequencing runs. Source code, a Docker image and a demo version are available at http://www.genoscope.cns.fr/boardion/ .
Collapse
|
45
|
Liu S, Wu I, Yu YP, Balamotis M, Ren B, Ben Yehezkel T, Luo JH. Targeted transcriptome analysis using synthetic long read sequencing uncovers isoform reprograming in the progression of colon cancer. Commun Biol 2021; 4:506. [PMID: 33907296 PMCID: PMC8079361 DOI: 10.1038/s42003-021-02024-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 03/09/2021] [Indexed: 02/02/2023] Open
Abstract
The characterization of human gene expression is limited by short read lengths, high error rates and large input requirements. Here, we used a synthetic long read (SLR) sequencing approach, LoopSeq, to generate accurate sequencing reads that span full length transcripts using standard short read data. LoopSeq identified isoforms from control samples with 99.4% accuracy and a 0.01% per-base error rate, exceeding the accuracy reported for other long-read technologies. Applied to targeted transcriptome sequencing from colon cancers and their metastatic counterparts, LoopSeq revealed large scale isoform redistributions from benign colon mucosa to primary colon cancer and metastatic cancer and identified several previously unknown fusion isoforms. Strikingly, single nucleotide variants (SNVs) occurred dominantly in specific isoforms and some SNVs underwent isoform switching in cancer progression. The ability to use short reads to generate accurate long-read data as the raw unit of information holds promise as a widely accessible approach in transcriptome sequencing.
Collapse
Affiliation(s)
- Silvia Liu
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
| | - Indira Wu
- Loop Genomics, Inc., San Jose, CA, 95138, USA
| | - Yan-Ping Yu
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
| | | | - Baoguo Ren
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA
| | | | - Jian-Hua Luo
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA.
- High Throughput Genome Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA.
- Pittsburgh Liver Research Center, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15261, USA.
| |
Collapse
|
46
|
Torma G, Tombácz D, Csabai Z, Moldován N, Mészáros I, Zádori Z, Boldogkői Z. Combined Short and Long-Read Sequencing Reveals a Complex Transcriptomic Architecture of African Swine Fever Virus. Viruses 2021; 13:v13040579. [PMID: 33808073 PMCID: PMC8103240 DOI: 10.3390/v13040579] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/17/2021] [Accepted: 03/28/2021] [Indexed: 11/16/2022] Open
Abstract
African swine fever virus (ASFV) is a large DNA virus belonging to the Asfarviridae family. Despite its agricultural importance, little is known about the fundamental molecular mechanisms of this pathogen. Short-read sequencing (SRS) can produce a huge amount of high-precision sequencing reads for transcriptomic profiling, but it is inefficient for comprehensively annotating transcriptomes. Long-read sequencing (LRS) can overcome some of SRS's limitations, but it also has drawbacks, such as low-coverage and high error rate. The limitations of the two approaches can be surmounted by the combined use of these techniques. In this study, we used Illumina SRS and Oxford Nanopore Technologies LRS platforms with multiple library preparation methods (amplified and direct cDNA sequencings and native RNA sequencing) for constructing the ASFV transcriptomic atlas. This work identified many novel transcripts and transcript isoforms and annotated the precise termini of previously described RNAs. This study identified a novel species of ASFV transcripts, the replication origin-associated RNAs. Additionally, we discovered several nested genes embedded into larger canonical genes. In contrast to the current view that the ASFV transcripts are monocistronic, we detected a significant extent of polycistronism. A multifaceted meshwork of transcriptional overlaps was also discovered.
Collapse
Affiliation(s)
- Gábor Torma
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Somogyi B. u. 4., 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (N.M.)
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Somogyi B. u. 4., 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (N.M.)
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Somogyi B. u. 4., 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (N.M.)
| | - Norbert Moldován
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Somogyi B. u. 4., 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (N.M.)
| | - István Mészáros
- Institute for Veterinary Medical Research, Centre for Agricultural Research, Hungária krt. 21, H-1143 Budapest, Hungary; (I.M.); (Z.Z.)
| | - Zoltán Zádori
- Institute for Veterinary Medical Research, Centre for Agricultural Research, Hungária krt. 21, H-1143 Budapest, Hungary; (I.M.); (Z.Z.)
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Somogyi B. u. 4., 6720 Szeged, Hungary; (G.T.); (D.T.); (Z.C.); (N.M.)
- Correspondence:
| |
Collapse
|
47
|
Broseus L, Thomas A, Oldfield AJ, Severac D, Dubois E, Ritchie W. TALC: Transcript-level Aware Long-read Correction. Bioinformatics 2021; 36:5000-5006. [PMID: 32910174 DOI: 10.1093/bioinformatics/btaa634] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/08/2020] [Accepted: 07/09/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Long-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous 'hybrid correction' algorithms have been developed for genomic data that correct long reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data. RESULTS We have created a novel reference-free algorithm called Transcript-level Aware Long-Read Correction (TALC) which models changes in RNA expression and isoform representation in a weighted De Bruijn graph to correct long reads from transcriptome studies. We show that transcript-level aware correction by TALC improves the accuracy of the whole spectrum of downstream RNA-seq applications and is thus necessary for transcriptome analyses that use long read technology. AVAILABILITY AND IMPLEMENTATION TALC is implemented in C++ and available at https://github.com/lbroseus/TALC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucile Broseus
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| | - Aubin Thomas
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| | - Andrew J Oldfield
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| | - Dany Severac
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier Cedex 5 34094, France
| | - Emeric Dubois
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier Cedex 5 34094, France
| | - William Ritchie
- Department of Genome Dynamics, Institut de Génétique Humaine, Centre National de la Recherche Scientifique (CNRS), Université de Montpellier, Montpellier 34396, France
| |
Collapse
|
48
|
Parker MT, Knop K, Barton GJ, Simpson GG. 2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing. Genome Biol 2021; 22:72. [PMID: 33648554 PMCID: PMC7919322 DOI: 10.1186/s13059-021-02296-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 02/10/2021] [Indexed: 01/04/2023] Open
Abstract
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.
Collapse
Affiliation(s)
- Matthew T Parker
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK.
| | - Katarzyna Knop
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| | - Geoffrey J Barton
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| | - Gordon G Simpson
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK. .,James Hutton Institute, Invergowrie, DD2 5DA, UK.
| |
Collapse
|
49
|
Identification of Dominant Transcripts in Oxidative Stress Response by a Full-Length Transcriptome Analysis. Mol Cell Biol 2021; 41:MCB.00472-20. [PMID: 33168698 DOI: 10.1128/mcb.00472-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 11/02/2020] [Indexed: 12/30/2022] Open
Abstract
Our body responds to environmental stress by changing the expression levels of a series of cytoprotective enzymes/proteins through multilayered regulatory mechanisms, including the KEAP1-NRF2 system. While NRF2 upregulates the expression of many cytoprotective genes, there are fundamental limitations in short-read RNA sequencing (RNA-Seq), resulting in confusion regarding interpreting the effectiveness of cytoprotective gene induction at the transcript level. To precisely delineate isoform usage in the stress response, we conducted independent full-length transcriptome profiling (isoform sequencing; Iso-Seq) analyses of lymphoblastoid cells from three volunteers under normal and electrophilic stress-induced conditions. We first determined the first exon usage in KEAP1 and NFE2L2 (encoding NRF2) and found the presence of transcript diversity. We then examined changes in isoform usage of NRF2 target genes under stress conditions and identified a few isoforms dominantly expressed in the majority of NRF2 target genes. The expression levels of isoforms determined by Iso-Seq analyses showed striking differences from those determined by short-read RNA-Seq; the latter could be misleading concerning the abundance of transcripts. These results support that transcript usage is tightly regulated to produce functional proteins under electrophilic stress. Our present study strongly argues that there are important benefits that can be achieved by long-read transcriptome sequencing.
Collapse
|
50
|
Sahlin K, Medvedev P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 2021; 12:2. [PMID: 33397972 PMCID: PMC7782715 DOI: 10.1038/s41467-020-20340-8] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 11/25/2020] [Indexed: 01/24/2023] Open
Abstract
Oxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This has greatly increased our ability to study the diversity of transcription mechanisms such as transcription initiation, termination, and alternative splicing. However, ONT still suffers from high error rates which have thus far limited its scope to reference-based analyses. When a reference is not available or is not a viable option due to reference-bias, error correction is a crucial step towards the reconstruction of the sequenced transcripts and downstream sequence analysis of transcripts. In this paper, we present a novel computational method to error correct ONT cDNA sequencing data, called isONcorrect. IsONcorrect is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths. We are able to obtain a median accuracy of 98.9-99.6%, demonstrating the feasibility of applying cost-effective cDNA full transcript length sequencing for reference-free transcriptome analysis.
Collapse
Affiliation(s)
- Kristoffer Sahlin
- Department of Mathematics, Science for Life Laboratory, Stockholm University, 106 91, Stockholm, Sweden
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|