1
|
Shoaran M, Sabaie H, Mostafavi M, Rezazadeh M. A comprehensive review of the applications of RNA sequencing in celiac disease research. Gene 2024; 927:148681. [PMID: 38871036 DOI: 10.1016/j.gene.2024.148681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/06/2024] [Accepted: 06/10/2024] [Indexed: 06/15/2024]
Abstract
RNA sequencing (RNA-seq) has undergone substantial advancements in recent decades and has emerged as a vital technique for profiling the transcriptome. The transition from bulk sequencing to single-cell and spatial approaches has facilitated the achievement of higher precision at cell resolution. It provides valuable biological knowledge about individual immune cells and aids in the discovery of the molecular mechanisms that contribute to the development of autoimmune diseases. Celiac disease (CeD) is an autoimmune disorder characterized by a strong immune response to gluten consumption. RNA-seq has led to significantly advanced research in multiple fields, particularly in CeD research. It has been instrumental in studies involving comparative transcriptomics, nutritional genomics and wheat research, cancer research in the context of CeD, genetic and noncoding RNA-mediated epigenetic insights, disease monitoring and biomarker discovery, regulation of mitochondrial functions, therapeutic target identification and drug mechanism of action, dietary factors, immune cell profiling and the immune landscape. This review offers a comprehensive examination of recent RNA-seq technology research in the field of CeD, highlighting future challenges and opportunities for its application.
Collapse
Affiliation(s)
- Maryam Shoaran
- Pediatric Health Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Hani Sabaie
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam Rezazadeh
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
2
|
Emonet A, Pérez-Antón M, Neumann U, Dunemann S, Huettel B, Koller R, Hay A. Amphicarpic development in Cardamine chenopodiifolia. THE NEW PHYTOLOGIST 2024; 244:1041-1056. [PMID: 39030843 DOI: 10.1111/nph.19965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 06/25/2024] [Indexed: 07/22/2024]
Abstract
Amphicarpy is an unusual trait where two fruit types develop on the same plant: one above and the other belowground. This trait is not found in conventional model species. Therefore, its development and molecular genetics remain under-studied. Here, we establish the allooctoploid Cardamine chenopodiifolia as an emerging experimental system to study amphicarpy. We characterized C. chenopodiifolia development, focusing on differences in morphology and cell wall histochemistry between above- and belowground fruit. We generated a reference transcriptome with PacBio full-length transcript sequencing and analysed differential gene expression between above- and belowground fruit valves. Cardamine chenopodiifolia has two contrasting modes of seed dispersal. The main shoot fails to bolt and initiates floral primordia that grow underground where they self-pollinate and set seed. By contrast, axillary shoots bolt and develop exploding seed pods aboveground. Morphological differences between aerial explosive fruit and subterranean nonexplosive fruit were reflected in a large number of differentially regulated genes involved in photosynthesis, secondary cell wall formation and defence responses. Tools established in C. chenopodiifolia, such as a reference transcriptome, draft genome assembly and stable plant transformation, pave the way to study amphicarpy and trait evolution via allopolyploidy.
Collapse
Affiliation(s)
- Aurélia Emonet
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, Köln, 50829, Germany
| | - Miguel Pérez-Antón
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, Köln, 50829, Germany
| | - Ulla Neumann
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, Köln, 50829, Germany
| | - Sonja Dunemann
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, Köln, 50829, Germany
| | - Bruno Huettel
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, Köln, 50829, Germany
| | - Robert Koller
- Forschungszentrum Jülich GmbH, Institute of Bio- and Geosciences, IBG-2: Plant Sciences, Wilhelm-Johnen-Street, Jülich, 52425, Germany
| | - Angela Hay
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, Köln, 50829, Germany
| |
Collapse
|
3
|
Reis RS, Clúa J, Jaskolowski A, Deforges J, Jacques-Vuarambon D, Guex N, Poirier Y. Phosphate deficiency alters transcript isoforms via alternative transcription start sites. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 120:218-233. [PMID: 39164918 DOI: 10.1111/tpj.16982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 05/24/2024] [Accepted: 07/13/2024] [Indexed: 08/22/2024]
Abstract
Alternative transcription start sites (TSS) are widespread in eukaryotes and can alter the 5' UTR length and coding potential of transcripts. Here we show that inorganic phosphate (Pi) availability regulates the usage of several alternative TSS in Arabidopsis (Arabidopsis thaliana). In comparison to phytohormone treatment, Pi had a pronounced and specific effect on the usage of many alternative TSS. By combining short-read RNA sequencing with long-read sequencing of full-length mRNAs, we identified a set of 45 genes showing alternative TSS under Pi deficiency. Alternative TSS affected several processes, such as translation via the exclusion of upstream open reading frames present in the 5' UTR of RETICULAN LIKE PROTEIN B1 mRNA, and subcellular localization via removal of the plastid transit peptide coding region from the mRNAs of HEME OXYGENASE 1 and SULFOQUINOVOSYLDIACYLGLYCEROL 2. Several alternative TSS also generated shorter transcripts lacking the coding potential for important domains. For example, the EVOLUTIONARILY CONSERVED C-TERMINAL REGION 4 (ECT4) locus, which encodes an N6-methyladenosine (m6A) reader, strongly expressed under Pi deficiency a short noncoding transcript (named ALTECT4) ~550 nt long with a TSS in the penultimate intron. The specific and robust induction of ALTECT4 production by Pi deficiency led to the identification of a role for m6A readers in primary root growth in response to low phosphate that is dependent on iron and is involved in modulating cell division in the root meristem. Our results identify alternative TSS usage as an important process in the plant response to Pi deficiency.
Collapse
Affiliation(s)
- Rodrigo S Reis
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, Lausanne, CH-1015, Switzerland
- Institute of Plant Sciences, University of Bern, Bern, CH-3013, Switzerland
| | - Joaquín Clúa
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, Lausanne, CH-1015, Switzerland
| | - Aime Jaskolowski
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, Lausanne, CH-1015, Switzerland
| | - Jules Deforges
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, Lausanne, CH-1015, Switzerland
| | - Dominique Jacques-Vuarambon
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, Lausanne, CH-1015, Switzerland
- Institute of Plant Sciences, University of Bern, Bern, CH-3013, Switzerland
| | - Nicolas Guex
- Bioinfomatics Competence Center, University of Lausanne, Lausanne, Switzerland
| | - Yves Poirier
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, Lausanne, CH-1015, Switzerland
| |
Collapse
|
4
|
Carbonell-Sala S, Perteghella T, Lagarde J, Nishiyori H, Palumbo E, Arnan C, Takahashi H, Carninci P, Uszczynska-Ratajczak B, Guigó R. CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing. Nat Commun 2024; 15:5278. [PMID: 38937428 PMCID: PMC11211341 DOI: 10.1038/s41467-024-49523-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 06/10/2024] [Indexed: 06/29/2024] Open
Abstract
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.
Collapse
Affiliation(s)
- Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Tamara Perteghella
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Flomics Biotech, SL, Carrer de Roc Boronat 31, 08005, Barcelona, Catalonia, Spain
| | - Hiromi Nishiyori
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan
| | - Emilio Palumbo
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Carme Arnan
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, Japan
- Human Technopole, Milan, Italy
| | - Barbara Uszczynska-Ratajczak
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), the Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| |
Collapse
|
5
|
Zhang J, Hou W, Zhao Q, Xiao S, Linghu H, Zhang L, Du J, Cui H, Yang X, Ling S, Su J, Kong Q. Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data. J Biol Chem 2023; 299:105130. [PMID: 37543366 PMCID: PMC10498003 DOI: 10.1016/j.jbc.2023.105130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 07/20/2023] [Accepted: 07/31/2023] [Indexed: 08/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) are increasingly being recognized as modulators in various biological processes. However, due to their low expression, their systematic characterization is difficult to determine. Here, we performed transcript annotation by a newly developed computational pipeline, termed RNA-seq and small RNA-seq combined strategy (RSCS), in a wide variety of cellular contexts. Thousands of high-confidence potential novel transcripts were identified by the RSCS, and the reliability of the transcriptome was verified by analysis of transcript structure, base composition, and sequence complexity. Evidenced by the length comparison, the frequency of the core promoter and the polyadenylation signal motifs, and the locations of transcription start and end sites, the transcripts appear to be full length. Furthermore, taking advantage of our strategy, we identified a large number of endogenous retrovirus-associated lncRNAs, and a novel endogenous retrovirus-lncRNA that was functionally involved in control of Yap1 expression and essential for early embryogenesis was identified. In summary, the RSCS can generate a more complete and precise transcriptome, and our findings greatly expanded the transcriptome annotation for the mammalian community.
Collapse
Affiliation(s)
- Jiaming Zhang
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China; Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Weibo Hou
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Qi Zhao
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Songling Xiao
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Hongye Linghu
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Lixin Zhang
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Jiawei Du
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Hongdi Cui
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Xu Yang
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Shukuan Ling
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou Medical University, Wenzhou, Zhejiang Province, China.
| | - Jianzhong Su
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Wenzhou Medical University, Wenzhou, Zhejiang Province, China.
| | - Qingran Kong
- Oujiang Laboratory, Zhejiang Provincial Key Laboratory of Medical Genetics, Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, Zhejiang Province, China.
| |
Collapse
|
6
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
7
|
Carbonell-Sala S, Lagarde J, Nishiyori H, Palumbo E, Arnan C, Takahashi H, Carninci P, Uszczynska-Ratajczak B, Guigó R. CapTrap-Seq: A platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.16.543444. [PMID: 37398314 PMCID: PMC10312720 DOI: 10.1101/2023.06.16.543444] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5'capped, full-length transcripts, together with the data processing pipeline LyRic. We benchmarked CapTrap-seq and other popular RNA-seq library preparation protocols in a number of human tissues using both ONT and PacBio sequencing. To assess the accuracy of the transcript models produced, we introduced a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation in RNA spike-in molecules. We found that the vast majority (up to 90%) of transcript models that LyRic derives from CapTrap-seq reads are full-length. This makes it possible to produce highly accurate annotations with minimal human intervention.
Collapse
|
8
|
Brochu H, Wang R, Tollison T, Pyo CW, Thomas A, Tseng E, Law L, Picker LJ, Gale M, Geraghty DE, Peng X. Alternative splicing and genetic variation of mhc-e: implications for rhesus cytomegalovirus-based vaccines. Commun Biol 2022; 5:1387. [PMID: 36536032 PMCID: PMC9762870 DOI: 10.1038/s42003-022-04344-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 12/06/2022] [Indexed: 12/23/2022] Open
Abstract
Rhesus cytomegalovirus (RhCMV)-based vaccination against Simian Immunodeficiency virus (SIV) elicits MHC-E-restricted CD8+ T cells that stringently control SIV infection in ~55% of vaccinated rhesus macaques (RM). However, it is unclear how accurately the RM model reflects HLA-E immunobiology in humans. Using long-read sequencing, we identified 16 Mamu-E isoforms and all Mamu-E splicing junctions were detected among HLA-E isoforms in humans. We also obtained the complete Mamu-E genomic sequences covering the full coding regions of 59 RM from a RhCMV/SIV vaccine study. The Mamu-E gene was duplicated in 32 (54%) of 59 RM. Among four groups of Mamu-E alleles: three ~5% divergent full-length allele groups (G1, G2, G2_LTR) and a fourth monomorphic group (G3) with a deletion encompassing the canonical Mamu-E exon 6, the presence of G2_LTR alleles was significantly (p = 0.02) associated with the lack of RhCMV/SIV vaccine protection. These genomic resources will facilitate additional MHC-E targeted translational research.
Collapse
Affiliation(s)
- Hayden Brochu
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC, 27607, USA
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC, 27695, USA
| | - Ruihan Wang
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Tammy Tollison
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC, 27607, USA
| | - Chul-Woo Pyo
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Alexander Thomas
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | | | - Lynn Law
- Department of Immunology, University of Washington, Seattle, WA, USA
- Center for Innate Immunity and Immune Diseases, University of Washington, Seattle, WA, USA
| | - Louis J Picker
- Vaccine and Gene Therapy Institute, Oregon Health & Science University, Beaverton, OR, 97006, USA
| | - Michael Gale
- Department of Immunology, University of Washington, Seattle, WA, USA
- Center for Innate Immunity and Immune Diseases, University of Washington, Seattle, WA, USA
- Washington National Primate Research Center, University of Washington, Seattle, WA, USA
| | - Daniel E Geraghty
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA.
| | - Xinxia Peng
- Department of Molecular Biomedical Sciences, North Carolina State University College of Veterinary Medicine, Raleigh, NC, 27607, USA.
- Bioinformatics Graduate Program, North Carolina State University, Raleigh, NC, 27695, USA.
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA.
| |
Collapse
|
9
|
Tu M, Zeng J, Zhang J, Fan G, Song G. Unleashing the power within short-read RNA-seq for plant research: Beyond differential expression analysis and toward regulomics. FRONTIERS IN PLANT SCIENCE 2022; 13:1038109. [PMID: 36570898 PMCID: PMC9773216 DOI: 10.3389/fpls.2022.1038109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
RNA-seq has become a state-of-the-art technique for transcriptomic studies. Advances in both RNA-seq techniques and the corresponding analysis tools and pipelines have unprecedently shaped our understanding in almost every aspects of plant sciences. Notably, the integration of huge amount of RNA-seq with other omic data sets in the model plants and major crop species have facilitated plant regulomics, while the RNA-seq analysis has still been primarily used for differential expression analysis in many less-studied plant species. To unleash the analytical power of RNA-seq in plant species, especially less-studied species and biomass crops, we summarize recent achievements of RNA-seq analysis in the major plant species and representative tools in the four types of application: (1) transcriptome assembly, (2) construction of expression atlas, (3) network analysis, and (4) structural alteration. We emphasize the importance of expression atlas, coexpression networks and predictions of gene regulatory relationships in moving plant transcriptomes toward regulomics, an omic view of genome-wide transcription regulation. We highlight what can be achieved in plant research with RNA-seq by introducing a list of representative RNA-seq analysis tools and resources that are developed for certain minor species or suitable for the analysis without species limitation. In summary, we provide an updated digest on RNA-seq tools, resources and the diverse applications for plant research, and our perspective on the power and challenges of short-read RNA-seq analysis from a regulomic point view. A full utilization of these fruitful RNA-seq resources will promote plant omic research to a higher level, especially in those less studied species.
Collapse
Affiliation(s)
- Min Tu
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Jian Zeng
- Guangdong Provincial Key Laboratory of Utilization and Conservation of Food and Medicinal Resources in Northern Region, Shaoguan University, Shaoguan, Guangdong, China
| | - Juntao Zhang
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Guozhi Fan
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Guangsen Song
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| |
Collapse
|
10
|
Shaw PJ, Kaewprommal P, Wongsombat C, Ngampiw C, Taechalertpaisarn T, Kamchonwongpaisan S, Tongsima S, Piriyapongsa J. Transcriptomic complexity of the human malaria parasite Plasmodium falciparum revealed by long-read sequencing. PLoS One 2022; 17:e0276956. [PMID: 36331983 PMCID: PMC9635732 DOI: 10.1371/journal.pone.0276956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
The Plasmodium falciparum human malaria parasite genome is incompletely annotated and does not accurately represent the transcriptomic diversity of this species. To address this need, we performed long-read transcriptomic sequencing. 5' capped mRNA was enriched from samples of total and nuclear-fractionated RNA from intra-erythrocytic stages and converted to cDNA library. The cDNA libraries were sequenced on PacBio and Nanopore long-read platforms. 12,495 novel isoforms were annotated from the data. Alternative 5' and 3' ends represent the majority of isoform events among the novel isoforms, with retained introns being the next most common event. The majority of alternative 5' ends correspond to genomic regions with features similar to those of the reference transcript 5' ends. However, a minority of alternative 5' ends showed markedly different features, including locations within protein-coding regions. Alternative 3' ends showed similar features to the reference transcript 3' ends, notably adenine-rich termination signals. Distinguishing features of retained introns could not be observed, except for a tendency towards shorter length and greater GC content compared with spliced introns. Expression of antisense and retained intron isoforms was detected at different intra-erythrocytic stages, suggesting developmental regulation of these isoform events. To gain insights into the possible functions of the novel isoforms, their protein-coding potential was assessed. Variants of P. falciparum proteins and novel proteins encoded by alternative open reading frames suggest that P. falciparum has a greater proteomic repertoire than the current annotation. We provide a catalog of annotated transcripts and encoded alternative proteins to support further studies on gene and protein regulation of this pathogen.
Collapse
Affiliation(s)
- Philip J. Shaw
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Pavita Kaewprommal
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Chayaphat Wongsombat
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Chumpol Ngampiw
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | | | - Sumalee Kamchonwongpaisan
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sissades Tongsima
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Jittima Piriyapongsa
- National Biobank of Thailand (NBT), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| |
Collapse
|
11
|
Bayega A, Oikonomopoulos S, Wang YC, Ragoussis J. Improved Nanopore full-length cDNA sequencing by PCR-suppression. Front Genet 2022; 13:1031355. [PMID: 36324505 PMCID: PMC9618600 DOI: 10.3389/fgene.2022.1031355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 09/30/2022] [Indexed: 11/29/2022] Open
Abstract
Full-length transcript sequencing remains a main goal of RNA sequencing. However, even the application of long-read sequencing technologies such as Oxford Nanopore Technologies still fail to yield full-length transcript sequencing for a significant portion of sequenced reads. Since these technologies can sequence reads that are far longer than the longest known processed transcripts, the lack of efficiency to obtain full-length transcripts from good quality RNAs stems from library preparation inefficiency rather than the presence of degraded RNA molecules. It has previously been shown that addition of inverted terminal repeats in cDNA during reverse transcription followed by single-primer PCR creates a PCR suppression effect that prevents amplification of short molecules thus enriching the library for longer transcripts. We adapted this method for Nanopore cDNA library preparation and show that not only is PCR efficiency increased but gene body coverage is dramatically improved. The results show that implementation of this simple strategy will result in better quality full-length RNA sequencing data and make full-length transcript sequencing possible for most of sequenced reads.
Collapse
Affiliation(s)
- Anthony Bayega
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Spyros Oikonomopoulos
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Yu Chang Wang
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
- Department of Bioengineering, McGill University, Montréal, QC, Canada
| |
Collapse
|
12
|
Baratta AM, Brandner AJ, Plasil SL, Rice RC, Farris SP. Advancements in Genomic and Behavioral Neuroscience Analysis for the Study of Normal and Pathological Brain Function. Front Mol Neurosci 2022; 15:905328. [PMID: 35813067 PMCID: PMC9259865 DOI: 10.3389/fnmol.2022.905328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Accepted: 06/06/2022] [Indexed: 11/16/2022] Open
Abstract
Psychiatric and neurological disorders are influenced by an undetermined number of genes and molecular pathways that may differ among afflicted individuals. Functionally testing and characterizing biological systems is essential to discovering the interrelationship among candidate genes and understanding the neurobiology of behavior. Recent advancements in genetic, genomic, and behavioral approaches are revolutionizing modern neuroscience. Although these tools are often used separately for independent experiments, combining these areas of research will provide a viable avenue for multidimensional studies on the brain. Herein we will briefly review some of the available tools that have been developed for characterizing novel cellular and animal models of human disease. A major challenge will be openly sharing resources and datasets to effectively integrate seemingly disparate types of information and how these systems impact human disorders. However, as these emerging technologies continue to be developed and adopted by the scientific community, they will bring about unprecedented opportunities in our understanding of molecular neuroscience and behavior.
Collapse
Affiliation(s)
- Annalisa M. Baratta
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Adam J. Brandner
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Sonja L. Plasil
- Department of Pharmacology & Chemical Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Rachel C. Rice
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Sean P. Farris
- Center for Neuroscience, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Anesthesiology and Perioperative Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- *Correspondence: Sean P. Farris,
| |
Collapse
|
13
|
Muyle AM, Seymour DK, Lv Y, Huettel B, Gaut BS. Gene-body methylation in plants: mechanisms, functions and important implications for understanding evolutionary processes. Genome Biol Evol 2022; 14:6550137. [PMID: 35298639 PMCID: PMC8995044 DOI: 10.1093/gbe/evac038] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2022] [Indexed: 11/13/2022] Open
Abstract
Gene body methylation (gbM) is an epigenetic mark where gene exons are methylated in the CG context only, as opposed to CHG and CHH contexts (where H stands for A, C, or T). CG methylation is transmitted transgenerationally in plants, opening the possibility that gbM may be shaped by adaptation. This presupposes, however, that gbM has a function that affects phenotype, which has been a topic of debate in the literature. Here, we review our current knowledge of gbM in plants. We start by presenting the well-elucidated mechanisms of plant gbM establishment and maintenance. We then review more controversial topics: the evolution of gbM and the potential selective pressures that act on it. Finally, we discuss the potential functions of gbM that may affect organismal phenotypes: gene expression stabilization and upregulation, inhibition of aberrant transcription (reverse and internal), prevention of aberrant intron retention, and protection against TE insertions. To bolster the review of these topics, we include novel analyses to assess the effect of gbM on transcripts. Overall, a growing body of literature finds that gbM correlates with levels and patterns of gene expression. It is not clear, however, if this is a causal relationship. Altogether, functional work suggests that the effects of gbM, if any, must be relatively small, but there is nonetheless evidence that it is shaped by natural selection. We conclude by discussing the potential adaptive character of gbM and its implications for an updated view of the mechanisms of adaptation in plants.
Collapse
Affiliation(s)
| | | | - Yuanda Lv
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, China
| | - Bruno Huettel
- Max Planck Genome Centre Cologne, Max Planck Institute for Plant Breeding, Cologne, Germany
| | | |
Collapse
|
14
|
Ergin S, Kherad N, Alagoz M. RNA sequencing and its applications in cancer and rare diseases. Mol Biol Rep 2022; 49:2325-2333. [PMID: 34988891 PMCID: PMC8731134 DOI: 10.1007/s11033-021-06963-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 11/16/2021] [Indexed: 12/19/2022]
Abstract
With the invention of RNA sequencing over a decade ago, diagnosis and identification of the gene-related diseases entered a new phase that enabled more accurate analysis of the diseases that are difficult to approach and analyze. RNA sequencing has availed in-depth study of transcriptomes in different species and provided better understanding of rare diseases and taxonomical classifications of various eukaryotic organisms. Development of single-cell, short-read, long-read and direct RNA sequencing using both blood and biopsy specimens of the organism together with recent advancement in computational analysis programs has made the medical professional's ability in identifying the origin and cause of genetic disorders indispensable. Altogether, such advantages have evolved the treatment design since RNA sequencing can detect the resistant genes against the existing therapies and help medical professions to take a further step in improving methods of treatments towards higher effectiveness and less side effects. Therefore, it is of essence to all researchers and scientists to have deeper insight in all available methods of RNA sequencing while taking a step-in therapy design.
Collapse
Affiliation(s)
- Selvi Ergin
- Department of Molecular Biology and Genetics, Biruni University, Istanbul, Turkey
| | - Nasim Kherad
- Department of Molecular Biology and Genetics, Biruni University, Istanbul, Turkey
| | - Meryem Alagoz
- Department of Molecular Biology and Genetics, Biruni University, Istanbul, Turkey.
| |
Collapse
|
15
|
Comprehensive transcriptome characterization of Grus japonensis using PacBio SMRT and Illumina sequencing. Sci Rep 2021; 11:23927. [PMID: 34907275 PMCID: PMC8671462 DOI: 10.1038/s41598-021-03474-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 12/03/2021] [Indexed: 12/13/2022] Open
Abstract
The red-crowned crane (Grus japonensis) is an endangered species distributed across southeast Russia, northeast China, Korea, and Japan. Here, we sequenced for the first time the full-length unreferenced transcriptome of red-crowned crane mixed samples using a PacBio Sequel platform. A total of 359,136 circular consensus sequences (CCS) were obtained via clustering to remove redundancy. A total of 303,544 full-length non-chimeric sequences were identified by judging whether CCS contained 5' and 3' adapters, and the poly(A) tail. Eight samples were sequenced using Illumina, and PacBio sequencing data were corrected according to the collected Illumina data to obtain more accurate full-length transcripts. A total of 4,100 long non-coding RNAs, 13,115 simple sequences repeat loci and 29 transcription factor families were identified. The expression of lncRNAs and TFs in pancreas was lowest comparing with other tissues. Many enriched immune-related transmission pathways (MHC and IL receptors) were identified in the spleen. This study will contribute to a better understanding of the gene structure and post-transcriptional regulatory network, and provide references for future studies on red-crowned cranes.
Collapse
|
16
|
Wieben ED, Aleff RA, Rinkoski TA, Baratz KH, Basu S, Patel SV, Maguire LJ, Fautsch MP. Comparison of TCF4 repeat expansion length in corneal endothelium and leukocytes of patients with Fuchs endothelial corneal dystrophy. PLoS One 2021; 16:e0260837. [PMID: 34855896 PMCID: PMC8638873 DOI: 10.1371/journal.pone.0260837] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 11/17/2021] [Indexed: 12/13/2022] Open
Abstract
Expansion of CTG trinucleotide repeats (TNR) in the transcription factor 4 (TCF4) gene is highly associated with Fuchs Endothelial Corneal Dystrophy (FECD). Due to limitations in the availability of DNA from diseased corneal endothelium, sizing of CTG repeats in FECD patients has typically been determined using DNA samples isolated from peripheral blood leukocytes. However, it is non-feasible to extract enough DNA from surgically isolated FECD corneal endothelial tissue to determine repeat length based on current technology. To circumvent this issue, total RNA was isolated from FECD corneal endothelium and sequenced using long-read sequencing. Southern blotting of DNA samples isolated from primary cultures of corneal endothelium from these same affected individuals was also assessed. Both long read sequencing and Southern blot analysis showed significantly longer CTG TNR expansion (>1000 repeats) in the corneal endothelium from FECD patients than those characterized in leukocytes from the same individuals (<90 repeats). Our findings suggest that the TCF4 CTG repeat expansions in the FECD corneal endothelium are much longer than those found in leukocytes.
Collapse
Affiliation(s)
- Eric D. Wieben
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, Minnesota, United states of America
| | - Ross A. Aleff
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, Minnesota, United states of America
| | - Tommy A. Rinkoski
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota, United states of America
| | - Keith H. Baratz
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota, United states of America
| | - Shubham Basu
- Division of Biostatistics and Bioinformatics and Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Sanjay V. Patel
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota, United states of America
| | - Leo J. Maguire
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota, United states of America
| | - Michael P. Fautsch
- Department of Ophthalmology, Mayo Clinic, Rochester, Minnesota, United states of America
| |
Collapse
|
17
|
de Medeiros Oliveira M, Bonadio I, Lie de Melo A, Mendes Souza G, Durham AM. TSSFinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes. Brief Bioinform 2021; 22:bbab198. [PMID: 34050351 PMCID: PMC8574697 DOI: 10.1093/bib/bbab198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 02/14/2021] [Accepted: 02/23/2021] [Indexed: 12/02/2022] Open
Abstract
Promoter annotation is an important task in the analysis of a genome. One of the main challenges for this task is locating the border between the promoter region and the transcribing region of the gene, the transcription start site (TSS). The TSS is the reference point to delimit the DNA sequence responsible for the assembly of the transcribing complex. As the same gene can have more than one TSS, so to delimit the promoter region, it is important to locate the closest TSS to the site of the beginning of the translation. This paper presents TSSFinder, a new software for the prediction of the TSS signal of eukaryotic genes that is significantly more accurate than other available software. We currently are the only application to offer pre-trained models for six different eukaryotic organisms: Arabidopsis thaliana, Drosophila melanogaster, Gallus gallus, Homo sapiens, Oryza sativa and Saccharomyces cerevisiae. Additionally, our software can be easily customized for specific organisms using only 125 DNA sequences with a validated TSS signal and corresponding genomic locations as a training set. TSSFinder is a valuable new tool for the annotation of genomes. TSSFinder source code and docker container can be downloaded from http://tssfinder.github.io. Alternatively, TSSFinder is also available as a web service at http://sucest-fun.org/wsapp/tssfinder/.
Collapse
Affiliation(s)
| | - Igor Bonadio
- Data Science, Elo7 Research Lab, São Paulo, Brazil
| | | | | | | |
Collapse
|
18
|
Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 537] [Impact Index Per Article: 179.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
Affiliation(s)
- Yunhao Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yue Zhao
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA
| | - Audrey Bollas
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Yuru Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA.
- Biomedical Informatics Shared Resources, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
19
|
Chung M, Bruno VM, Rasko DA, Cuomo CA, Muñoz JF, Livny J, Shetty AC, Mahurkar A, Dunning Hotopp JC. Best practices on the differential expression analysis of multi-species RNA-seq. Genome Biol 2021; 22:121. [PMID: 33926528 PMCID: PMC8082843 DOI: 10.1186/s13059-021-02337-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 04/01/2021] [Indexed: 02/07/2023] Open
Abstract
Advances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.
Collapse
Affiliation(s)
- Matthew Chung
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Vincent M. Bruno
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - David A. Rasko
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Christina A. Cuomo
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - José F. Muñoz
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - Jonathan Livny
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA 02142 USA
| | - Amol C. Shetty
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Anup Mahurkar
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Julie C. Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201 USA
- Greenebaum Cancer Center, University of Maryland, Baltimore, MD 21201 USA
| |
Collapse
|
20
|
Islas-Flores T, Galán-Vásquez E, Villanueva MA. Screening a Spliced Leader-Based Symbiodinium microadriaticum cDNA Library Using the Yeast-Two Hybrid System Reveals a Hemerythrin-Like Protein as a Putative SmicRACK1 Ligand. Microorganisms 2021; 9:microorganisms9040791. [PMID: 33918967 PMCID: PMC8070245 DOI: 10.3390/microorganisms9040791] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 03/13/2021] [Accepted: 03/16/2021] [Indexed: 11/16/2022] Open
Abstract
The dinoflagellate Symbiodiniaceae family plays a central role in the health of the coral reef ecosystem via the symbiosis that establishes with its inhabiting cnidarians and supports the host metabolism. In the last few decades, coral reefs have been threatened by pollution and rising temperatures which have led to coral loss. These events have raised interest in studying Symbiodiniaceae and their hosts; however, progress in understanding their metabolism, signal transduction pathways, and physiology in general, has been slow because dinoflagellates present peculiar characteristics. We took advantage of one of these peculiarities; namely, the post-transcriptional addition of a Dino Spliced Leader (Dino-SL) to the 5' end of the nuclear mRNAs, and used it to generate cDNA libraries from Symbiodinium microadriaticum. We compared sequences from two Yeast-Two Hybrid System cDNA Libraries, one based on the Dino-SL sequence, and the other based on the SMART technology (Switching Mechanism at 5' end of RNA Transcript) which exploits the template switching function of the reverse transcriptase. Upon comparison of the performance of both libraries, we obtained a significantly higher yield, number and length of sequences, number of transcripts, and better 5' representation from the Dino-SL based library than from the SMART library. In addition, we confirmed that the cDNAs from the Dino-SL library were adequately expressed in the yeast cells used for the Yeast-Two Hybrid System which resulted in successful screening for putative SmicRACK1 ligands, which yielded a putative hemerythrin-like protein.
Collapse
Affiliation(s)
- Tania Islas-Flores
- Unidad Académica de Sistemas Arrecifales, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, UNAM, Prolongación Avenida Niños Héroes S/N, Puerto Morelos, Quintana Roo 77580, México
- Correspondence: (T.I.-F.); (M.A.V.); Tel.: +52-998-871-0009 (T.I.-F. & M.A.V.)
| | - Edgardo Galán-Vásquez
- Departamento de Ingeniería de Sistemas Computacionales y Automatización, Instituto de Investigación en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, UNAM, Circuito Escolar 3000, Ciudad Universitaria, Ciudad de México CP 04510, México;
| | - Marco A. Villanueva
- Unidad Académica de Sistemas Arrecifales, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, UNAM, Prolongación Avenida Niños Héroes S/N, Puerto Morelos, Quintana Roo 77580, México
- Correspondence: (T.I.-F.); (M.A.V.); Tel.: +52-998-871-0009 (T.I.-F. & M.A.V.)
| |
Collapse
|
21
|
Fernandez‐Pozo N, Metz T, Chandler JO, Gramzow L, Mérai Z, Maumus F, Mittelsten Scheid O, Theißen G, Schranz ME, Leubner‐Metzger G, Rensing SA. Aethionema arabicum genome annotation using PacBio full-length transcripts provides a valuable resource for seed dormancy and Brassicaceae evolution research. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 106:275-293. [PMID: 33453123 PMCID: PMC8641386 DOI: 10.1111/tpj.15161] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 12/31/2020] [Accepted: 01/08/2021] [Indexed: 05/06/2023]
Abstract
Aethionema arabicum is an important model plant for Brassicaceae trait evolution, particularly of seed (development, regulation, germination, dormancy) and fruit (development, dehiscence mechanisms) characters. Its genome assembly was recently improved but the gene annotation was not updated. Here, we improved the Ae. arabicum gene annotation using 294 RNA-seq libraries and 136 307 full-length PacBio Iso-seq transcripts, increasing BUSCO completeness by 11.6% and featuring 5606 additional genes. Analysis of orthologs showed a lower number of genes in Ae. arabicum than in other Brassicaceae, which could be partially explained by loss of homeologs derived from the At-α polyploidization event and by a lower occurrence of tandem duplications after divergence of Aethionema from the other Brassicaceae. Benchmarking of MADS-box genes identified orthologs of FUL and AGL79 not found in previous versions. Analysis of full-length transcripts related to ABA-mediated seed dormancy discovered a conserved isoform of PIF6-β and antisense transcripts in ABI3, ABI4 and DOG1, among other cases found of different alternative splicing between Turkey and Cyprus ecotypes. The presented data allow alternative splicing mining and proposition of numerous hypotheses to research evolution and functional genomics. Annotation data and sequences are available at the Ae. arabicum DB (https://plantcode.online.uni-marburg.de/aetar_db).
Collapse
Affiliation(s)
- Noe Fernandez‐Pozo
- Plant Cell BiologyDepartment of BiologyUniversity of MarburgMarburgGermany
| | - Timo Metz
- Plant Cell BiologyDepartment of BiologyUniversity of MarburgMarburgGermany
| | - Jake O. Chandler
- School of Biological SciencesRoyal Holloway University of LondonEghamSurreyUK
| | - Lydia Gramzow
- Matthias Schleiden Institute/GeneticsFriedrich Schiller University JenaJenaGermany
| | - Zsuzsanna Mérai
- Gregor Mendel Institute of Molecular Plant BiologyAustrian Academy of SciencesVienna BioCenter (VBC)ViennaAustria
| | | | - Ortrun Mittelsten Scheid
- Gregor Mendel Institute of Molecular Plant BiologyAustrian Academy of SciencesVienna BioCenter (VBC)ViennaAustria
| | - Günter Theißen
- Matthias Schleiden Institute/GeneticsFriedrich Schiller University JenaJenaGermany
| | - M. Eric Schranz
- Biosystematics GroupWageningen UniversityWageningenThe Netherlands
| | - Gerhard Leubner‐Metzger
- School of Biological SciencesRoyal Holloway University of LondonEghamSurreyUK
- Laboratory of Growth RegulatorsCentre of the Region Haná for Biotechnological and Agricultural ResearchPalacký University and Institute of Experimental BotanyAcademy of Sciences of the Czech RepublicOlomoucCzech Republic
| | - Stefan A. Rensing
- Plant Cell BiologyDepartment of BiologyUniversity of MarburgMarburgGermany
- BIOSS Centre for Biological Signaling StudiesUniversity of FreiburgFreiburgGermany
- LOEWE Center for Synthetic Microbiology (SYNMIKRO)University of MarburgMarburgGermany
| |
Collapse
|
22
|
Mouchbahani-Constance S, Sharif-Naeini R. Proteomic and Transcriptomic Techniques to Decipher the Molecular Evolution of Venoms. Toxins (Basel) 2021; 13:154. [PMID: 33669432 PMCID: PMC7920473 DOI: 10.3390/toxins13020154] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/06/2021] [Accepted: 02/10/2021] [Indexed: 12/24/2022] Open
Abstract
Nature's library of venoms is a vast and untapped resource that has the potential of becoming the source of a wide variety of new drugs and therapeutics. The discovery of these valuable molecules, hidden in diverse collections of different venoms, requires highly specific genetic and proteomic sequencing techniques. These have been used to sequence a variety of venom glands from species ranging from snakes to scorpions, and some marine species. In addition to identifying toxin sequences, these techniques have paved the way for identifying various novel evolutionary links between species that were previously thought to be unrelated. Furthermore, proteomics-based techniques have allowed researchers to discover how specific toxins have evolved within related species, and in the context of environmental pressures. These techniques allow groups to discover novel proteins, identify mutations of interest, and discover new ways to modify toxins for biomimetic purposes and for the development of new therapeutics.
Collapse
Affiliation(s)
| | - Reza Sharif-Naeini
- Department of Physiology and Cell Information Systems Group, Alan Edwards Center for Research on Pain, McGill University, Montreal, QC H3A 0G4, Canada;
| |
Collapse
|
23
|
Islam MA, Rony SA, Rahman MB, Cinar MU, Villena J, Uddin MJ, Kitazawa H. Improvement of Disease Resistance in Livestock: Application of Immunogenomics and CRISPR/Cas9 Technology. Animals (Basel) 2020; 10:E2236. [PMID: 33260762 PMCID: PMC7761152 DOI: 10.3390/ani10122236] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 11/18/2020] [Accepted: 11/26/2020] [Indexed: 01/09/2023] Open
Abstract
Disease occurrence adversely affects livestock production and animal welfare, and have an impact on both human health and public perception of food-animals production. Combined efforts from farmers, animal scientists, and veterinarians have been continuing to explore the effective disease control approaches for the production of safe animal-originated food. Implementing the immunogenomics, along with genome editing technology, has been considering as the key approach for safe food-animal production through the improvement of the host genetic resistance. Next-generation sequencing, as a cutting-edge technique, enables the production of high throughput transcriptomic and genomic profiles resulted from host-pathogen interactions. Immunogenomics combine the transcriptomic and genomic data that links to host resistance to disease, and predict the potential candidate genes and their genomic locations. Genome editing, which involves insertion, deletion, or modification of one or more genes in the DNA sequence, is advancing rapidly and may be poised to become a commercial reality faster than it has thought. The clustered regulatory interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) [CRISPR/Cas9] system has recently emerged as a powerful tool for genome editing in agricultural food production including livestock disease management. CRISPR/Cas9 mediated insertion of NRAMP1 gene for producing tuberculosis resistant cattle, and deletion of CD163 gene for producing porcine reproductive and respiratory syndrome (PRRS) resistant pigs are two groundbreaking applications of genome editing in livestock. In this review, we have highlighted the technological advances of livestock immunogenomics and the principles and scopes of application of CRISPR/Cas9-mediated targeted genome editing in animal breeding for disease resistance.
Collapse
Affiliation(s)
- Md. Aminul Islam
- Department of Medicine, Faculty of Veterinary Science, Bangladesh Agricultural University, Mymensingh 2202, Bangladesh;
- Food and Feed Immunology Group, Graduate School of Agricultural University Science, Tohoku University, Sendai 980-8572, Japan;
- Livestock Immunology Unit, International Research and Education Centre for Food and Agricultural Immunology (CFAI), Graduate School of Agricultural Science, Tohoku University, Sendai 980-8572, Japan
| | - Sharmin Aqter Rony
- Department of Parasitology, Faculty of Veterinary Science, Bangladesh Agricultural University, Mymensingh 2202, Bangladesh;
| | - Mohammad Bozlur Rahman
- Department of Livestock Services, Krishi Khamar Sarak, Farmgate, Dhaka 1215, Bangladesh;
| | - Mehmet Ulas Cinar
- Department of Animal Science, Faculty of Agriculture, Erciyes University, 38039 Kayseri, Turkey;
- Department of Veterinary Microbiology & Pathology, College of Veterinary Medicine, Washington State University, Pullman, WA 99164, USA
| | - Julio Villena
- Food and Feed Immunology Group, Graduate School of Agricultural University Science, Tohoku University, Sendai 980-8572, Japan;
- Laboratory of Immunobiotechnology, Reference Centre for Lactobacilli, (CERELA), Tucuman 4000, Argentina
| | - Muhammad Jasim Uddin
- Department of Medicine, Faculty of Veterinary Science, Bangladesh Agricultural University, Mymensingh 2202, Bangladesh;
- School of Veterinary Science, Gatton Campus, The University of Queensland, Brisbane 4072, Australia
| | - Haruki Kitazawa
- Food and Feed Immunology Group, Graduate School of Agricultural University Science, Tohoku University, Sendai 980-8572, Japan;
- Livestock Immunology Unit, International Research and Education Centre for Food and Agricultural Immunology (CFAI), Graduate School of Agricultural Science, Tohoku University, Sendai 980-8572, Japan
| |
Collapse
|
24
|
Hon T, Mars K, Young G, Tsai YC, Karalius JW, Landolin JM, Maurer N, Kudrna D, Hardigan MA, Steiner CC, Knapp SJ, Ware D, Shapiro B, Peluso P, Rank DR. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data 2020; 7:399. [PMID: 33203859 PMCID: PMC7673114 DOI: 10.1038/s41597-020-00743-4] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 10/27/2020] [Indexed: 02/06/2023] Open
Abstract
The PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System. Measurement(s) | DNA • genome • Metagenome | Technology Type(s) | DNA sequencing • PacBio Sequel System | Factor Type(s) | organism that had its genome sequenced | Sample Characteristic - Organism | Mus musculus • Rana muscosa • Fragaria x ananassa • Zea mays |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12855527
Collapse
Affiliation(s)
- Ting Hon
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA
| | - Kristin Mars
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA
| | - Greg Young
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA
| | - Yu-Chih Tsai
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA
| | - Joseph W Karalius
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA
| | - Jane M Landolin
- Ravel Biotechnology Inc., 953 Indiana St., San Francisco, CA, 94107, USA
| | - Nicholas Maurer
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - David Kudrna
- Arizona Genomics Institute and School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA
| | - Michael A Hardigan
- Department of Plant Sciences, University of California, Davis, One Shields Ave, Davis, CA, 95616-8571, USA
| | - Cynthia C Steiner
- Conservation Genetics, Beckman Center for Conservation Research, San Diego Zoo Global, 15600 San Pasqual Valley Road, Escondido, CA, 92027, USA
| | - Steven J Knapp
- Department of Plant Sciences, University of California, Davis, One Shields Ave, Davis, CA, 95616-8571, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA.,USDA-ARS, Plant, Soil, and Nutrition Research Unit, Ithaca, NY, 14853, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, 95064, USA.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Paul Peluso
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA
| | - David R Rank
- Pacific Biosciences of California Inc., 1305 O'Brien Dr., Menlo Park, CA, 94025, USA.
| |
Collapse
|
25
|
Long-read RNA sequencing of human and animal filarial parasites improves gene models and discovers operons. PLoS Negl Trop Dis 2020; 14:e0008869. [PMID: 33196647 PMCID: PMC7704054 DOI: 10.1371/journal.pntd.0008869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 11/30/2020] [Accepted: 10/09/2020] [Indexed: 01/01/2023] Open
Abstract
Filarial parasitic nematodes (Filarioidea) cause substantial disease burden to humans and animals around the world. Recently there has been a coordinated global effort to generate, annotate, and curate genomic data from nematode species of medical and veterinary importance. This has resulted in two chromosome-level assemblies (Brugia malayi and Onchocerca volvulus) and 11 additional draft genomes from Filarioidea. These reference assemblies facilitate comparative genomics to explore basic helminth biology and prioritize new drug and vaccine targets. While the continual improvement of genome contiguity and completeness advances these goals, experimental functional annotation of genes is often hindered by poor gene models. Short-read RNA sequencing data and expressed sequence tags, in cooperation with ab initio prediction algorithms, are employed for gene prediction, but these can result in missing clade-specific genes, fragmented models, imperfect mapping of gene ends, and lack of isoform resolution. Long-read RNA sequencing can overcome these drawbacks and greatly improve gene model quality. Here, we present Iso-Seq data for B. malayi and Dirofilaria immitis, etiological agents of lymphatic filariasis and canine heartworm disease, respectively. These data cover approximately half of the known coding genomes and substantially improve gene models by extending untranslated regions, cataloging novel splice junctions from novel isoforms, and correcting mispredicted junctions. Furthermore, we validated computationally predicted operons, manually curated new operons, and merged fragmented gene models. We carried out analyses of poly(A) tails in both species, leading to the identification of non-canonical poly(A) signals. Finally, we prioritized and assessed known and putative anthelmintic targets, correcting or validating gene models for molecular cloning and target-based anthelmintic screening efforts. Overall, these data significantly improve the catalog of gene models for two important parasites, and they demonstrate how long-read RNA sequencing should be prioritized for ongoing improvement of parasitic nematode genome assemblies. Filarial parasitic nematodes are vector-borne parasites that infect humans and animals. Brugia malayi and Dirofilaria immitis are transmitted by mosquitoes and cause human lymphatic filariasis and canine heartworm disease, respectively. Recent years have seen a dramatic increase in genomic and transcriptomic data sets and the concomitant increase in innovative strategies for drug target identification, validation, and screening. However, while the completeness of genome assemblies of filarial parasitic nematodes has seen steady improvements, the reliability of gene models has not kept pace, hindering cloning efforts. Long-read RNA sequencing technologies are uniquely able to improve gene models, but have not been widely used for the causative agents of neglected tropical diseases. Here, we report the improvement of gene models in both B. malayi and D. immitis by long-read RNA sequencing. We identified novel operons, deprecated false positive operons, identified dozens of novel genes, and described the parameters of polyadenylation. We also focused on putative anthelmintic targets, identifying novel isoforms and correcting gene models. These data substantially increase the trustworthiness of gene models in these two species and demonstrate how long-read sequencing approaches should be prioritized in the continued improvement of genome assemblies and their gene annotations.
Collapse
|
26
|
Xie C, Bekpen C, Künzel S, Keshavarz M, Krebs-Wheaton R, Skrabar N, Ullrich KK, Zhang W, Tautz D. Dedicated transcriptomics combined with power analysis lead to functional understanding of genes with weak phenotypic changes in knockout lines. PLoS Comput Biol 2020; 16:e1008354. [PMID: 33180766 PMCID: PMC7685438 DOI: 10.1371/journal.pcbi.1008354] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 11/24/2020] [Accepted: 09/20/2020] [Indexed: 12/26/2022] Open
Abstract
Systematic knockout studies in mice have shown that a large fraction of the gene replacements show no lethal or other overt phenotypes. This has led to the development of more refined analysis schemes, including physiological, behavioral, developmental and cytological tests. However, transcriptomic analyses have not yet been systematically evaluated for non-lethal knockouts. We conducted a power analysis to determine the experimental conditions under which even small changes in transcript levels can be reliably traced. We have applied this to two gene disruption lines of genes for which no function was known so far. Dedicated phenotyping tests informed by the tissues and stages of highest expression of the two genes show small effects on the tested phenotypes. For the transcriptome analysis of these stages and tissues, we used a prior power analysis to determine the number of biological replicates and the sequencing depth. We find that under these conditions, the knockouts have a significant impact on the transcriptional networks, with thousands of genes showing small transcriptional changes. GO analysis suggests that A930004D18Rik is involved in developmental processes through contributing to protein complexes, and A830005F24Rik in extracellular matrix functions. Subsampling analysis of the data reveals that the increase in the number of biological replicates was more important that increasing the sequencing depth to arrive at these results. Hence, our proof-of-principle experiment suggests that transcriptomic analysis is indeed an option to study gene functions of genes with weak or no traceable phenotypic effects and it provides the boundary conditions under which this is possible. Knockout mice benefit the understanding of gene functions in mammals. However, it has proven difficult for many genes to identify clear phenotypes, related due to lack of sufficient assays. As Lewis Wolpert put it in a famous quote “But did you take them to the opera?”, thus metaphorically alluding to the need to extend phenotyping efforts. This insight led to the establishment of phenotyping pipelines that are nowadays routinely used to characterize knock-out lines. However, transcriptomic approaches based on RNA-Seq have been much less explored for such deep-level studies. We conducted here both, a theoretical power analysis and practical RNA-Seq experiments on two knockout lines with small phenotypic effects to investigate the parameters including sample size, sequencing depth, fold change, and dispersion. Our dedicated RNA-Seq studies discovered thousands of genes with small transcriptional changes and enriched in specific functions in both knockout lines. We find that it is more important to increase the number of samples than to increase the sequencing depth. Our work shows that a deep RNA-Seq study on knockouts is powerful for understanding gene functions in cases of weak phenotypic effects, and provides a guideline for the experimental design of such studies.
Collapse
Affiliation(s)
- Chen Xie
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- * E-mail:
| | - Cemalettin Bekpen
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Maryam Keshavarz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Rebecca Krebs-Wheaton
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Neva Skrabar
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Kristian K. Ullrich
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Wenyu Zhang
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
27
|
Kuo RI, Cheng Y, Zhang R, Brown JWS, Smith J, Archibald AL, Burt DW. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genomics 2020; 21:751. [PMID: 33126848 PMCID: PMC7596999 DOI: 10.1186/s12864-020-07123-7] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/06/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The human transcriptome annotation is regarded as one of the most complete of any eukaryotic species. However, limitations in sequencing technologies have biased the annotation toward multi-exonic protein coding genes. Accurate high-throughput long read transcript sequencing can now provide additional evidence for rare transcripts and genes such as mono-exonic and non-coding genes that were previously either undetectable or impossible to differentiate from sequencing noise. RESULTS We developed the Transcriptome Annotation by Modular Algorithms (TAMA) software to leverage the power of long read transcript sequencing and address the issues with current data processing pipelines. TAMA achieved high sensitivity and precision for gene and transcript model predictions in both reference guided and unguided approaches in our benchmark tests using simulated Pacific Biosciences (PacBio) and Nanopore sequencing data and real PacBio datasets. By analyzing PacBio Sequel II Iso-Seq sequencing data of the Universal Human Reference RNA (UHRR) using TAMA and other commonly used tools, we found that the convention of using alignment identity to measure error correction performance does not reflect actual gain in accuracy of predicted transcript models. In addition, inter-read error correction can cause major changes to read mapping, resulting in potentially over 6 K erroneous gene model predictions in the Iso-Seq based human genome annotation. Using TAMA's genome assembly based error correction and gene feature evidence, we predicted 2566 putative novel non-coding genes and 1557 putative novel protein coding gene models. CONCLUSIONS Long read transcript sequencing data has the power to identify novel genes within the highly annotated human genome. The use of parameter tuning and extensive output information of the TAMA software package allows for in depth exploration of eukaryotic transcriptomes. We have found long read data based evidence for thousands of unannotated genes within the human genome. More development in sequencing library preparation and data processing are required for differentiating sequencing noise from real genes in long read RNA sequencing data.
Collapse
Affiliation(s)
- Richard I Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK.
| | - Yuanyuan Cheng
- The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
- School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales, Australia
| | - Runxuan Zhang
- Information and Computational Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, UK
| | - John W S Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee, Invergowrie, Dundee, Scotland, UK
- Cell and Molecular Sciences, The James Hutton Institute, Invergowrie, Dundee, Scotland, UK
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - Alan L Archibald
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
- The University of Queensland, St. Lucia, Brisbane, QLD, 4072, Australia
| |
Collapse
|
28
|
Oikonomopoulos S, Bayega A, Fahiminiya S, Djambazian H, Berube P, Ragoussis J. Methodologies for Transcript Profiling Using Long-Read Technologies. Front Genet 2020; 11:606. [PMID: 32733532 PMCID: PMC7358353 DOI: 10.3389/fgene.2020.00606] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 05/19/2020] [Indexed: 12/28/2022] Open
Abstract
RNA sequencing using next-generation sequencing technologies (NGS) is currently the standard approach for gene expression profiling, particularly for large-scale high-throughput studies. NGS technologies comprise high throughput, cost efficient short-read RNA-Seq, while emerging single molecule, long-read RNA-Seq technologies have enabled new approaches to study the transcriptome and its function. The emerging single molecule, long-read technologies are currently commercially available by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), while new methodologies based on short-read sequencing approaches are also being developed in order to provide long range single molecule level information-for example, the ones represented by the 10x Genomics linked read methodology. The shift toward long-read sequencing technologies for transcriptome characterization is based on current increases in throughput and decreases in cost, making these attractive for de novo transcriptome assembly, isoform expression quantification, and in-depth RNA species analysis. These types of analyses were challenging with standard short sequencing approaches, due to the complex nature of the transcriptome, which consists of variable lengths of transcripts and multiple alternatively spliced isoforms for most genes, as well as the high sequence similarity of highly abundant species of RNA, such as rRNAs. Here we aim to focus on single molecule level sequencing technologies and single-cell technologies that, combined with perturbation tools, allow the analysis of complete RNA species, whether short or long, at high resolution. In parallel, these tools have opened new ways in understanding gene functions at the tissue, network, and pathway levels, as well as their detailed functional characterization. Analysis of the epi-transcriptome, including RNA methylation and modification and the effects of such modifications on biological systems is now enabled through direct RNA sequencing instead of classical indirect approaches. However, many difficulties and challenges remain, such as methodologies to generate full-length RNA or cDNA libraries from all different species of RNAs, not only poly-A containing transcripts, and the identification of allele-specific transcripts due to current error rates of single molecule technologies, while the bioinformatics analysis on long-read data for accurate identification of 5' and 3' UTRs is still in development.
Collapse
Affiliation(s)
- Spyros Oikonomopoulos
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Anthony Bayega
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Somayyeh Fahiminiya
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Haig Djambazian
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Pierre Berube
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- McGill Genome Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
- Department of Bioengineering, McGill University, Montréal, QC, Canada
| |
Collapse
|
29
|
Yao S, Liang F, Gill RA, Huang J, Cheng X, Liu Y, Tong C, Liu S. A global survey of the transcriptome of allopolyploid Brassica napus based on single-molecule long-read isoform sequencing and Illumina-based RNA sequencing data. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:843-857. [PMID: 32270540 DOI: 10.1111/tpj.14754] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 03/15/2020] [Accepted: 03/18/2020] [Indexed: 05/21/2023]
Abstract
Brassica napus is a recent allopolyploid derived from the hybridization of Brassica rapa (Ar Ar ) and Brassica oleracea (Co Co ). Because of the high sequence similarity between the An and Cn subgenomes, it is difficult to provide an accurate landscape of the whole transcriptome of B. napus. To overcome this problem, we applied a single-molecule long-read isoform sequencing (Iso-Seq) technique that can produce long reads to explore the complex transcriptome of B. napus at the isoform level. From the Iso-Seq data, we obtained 147 698 non-redundant isoforms, capturing 37 403 annotated genes. A total of 18.1% (14 934/82 367) of the multi-exonic genes showed alternative splicing (AS). In addition, we identified 549 long non-coding RNAs, the majority of which displayed tissue-specific expression profiles, and detected 7742 annotated genes that possessed isoforms containing alternative polyadenylation sites. Moreover, 31 591 AS events located in open reading frames (ORFs) lead to potential protein isoforms by in-frame or frameshift changes in the ORF. Illumina RNA sequencing of five tissues that were pooled for Iso-Seq was also performed and showed that 69% of the AS events were tissue-specific. Our data provide abundant transcriptome resources for a transcript isoform catalog of B. napus, which will facilitate genome reannotation, strengthen our understanding of the B. napus transcriptome and be applied for further functional genomic research.
Collapse
Affiliation(s)
- Shengli Yao
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Fan Liang
- Nextomics Biosciences, Wuhan, 430000, Hubei, China
| | - Rafaqat Ali Gill
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Junyan Huang
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
- Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Faculty of Life Science, Hubei University, Wuhan, 430062, Hubei, China
| | - Xiaohui Cheng
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
- Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Faculty of Life Science, Hubei University, Wuhan, 430062, Hubei, China
| | - Yueying Liu
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
- Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Faculty of Life Science, Hubei University, Wuhan, 430062, Hubei, China
| | - Chaobo Tong
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
- Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Faculty of Life Science, Hubei University, Wuhan, 430062, Hubei, China
| | - Shengyi Liu
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, the Ministry of Agriculture and Rural Affairs of the PRC, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
- Hubei Collaborative Innovation Center for Green Transformation of Bio-Resources, Faculty of Life Science, Hubei University, Wuhan, 430062, Hubei, China
| |
Collapse
|
30
|
Szabo EX, Reichert P, Lehniger MK, Ohmer M, de Francisco Amorim M, Gowik U, Schmitz-Linneweber C, Laubinger S. Metabolic Labeling of RNAs Uncovers Hidden Features and Dynamics of the Arabidopsis Transcriptome. THE PLANT CELL 2020; 32:871-887. [PMID: 32060173 PMCID: PMC7145469 DOI: 10.1105/tpc.19.00214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 01/14/2020] [Accepted: 02/11/2020] [Indexed: 05/05/2023]
Abstract
Transcriptome analysis by RNA sequencing (RNA-seq) has become an indispensable research tool in modern plant biology. Virtually all RNA-seq studies provide a snapshot of the steady state transcriptome, which contains valuable information about RNA populations at a given time but lacks information about the dynamics of RNA synthesis and degradation. Only a few specialized sequencing techniques, such as global run-on sequencing, have been used to provide information about RNA synthesis rates in plants. Here, we demonstrate that RNA labeling with the modified, nontoxic uridine analog 5-ethynyl uridine (5-EU) in Arabidopsis (Arabidopsis thaliana) seedlings provides insight into plant transcriptome dynamics. Pulse labeling with 5-EU revealed nascent and unstable RNAs, RNA processing intermediates generated by splicing, and chloroplast RNAs. Pulse-chase experiments with 5-EU allowed us to determine RNA stabilities without the need for chemical transcription inhibitors such as actinomycin and cordycepin. Inhibitor-free, genome-wide analysis of polyadenylated RNA stability via 5-EU pulse-chase experiments revealed RNAs with shorter half-lives than those reported after chemical inhibition of transcription. In summary, our results indicate that the Arabidopsis nascent transcriptome contains unstable RNAs and RNA processing intermediates and suggest that polyadenylated RNAs have low stability in plants. Our technique lays the foundation for easy, affordable, nascent transcriptome analysis and inhibitor-free analysis of RNA stability in plants.
Collapse
Affiliation(s)
- Emese Xochitl Szabo
- Institute for Biology and Environmental Science, University of Oldenburg, 26129 Oldenburg, Germany
- Centre for Plant Molecular Biology, University of Tübingen, 72074 Tübingen, Germany
- Chemical Genomics Centre of the Max Planck Society, 44227 Dortmund, Germany
| | - Philipp Reichert
- Institute for Biology and Environmental Science, University of Oldenburg, 26129 Oldenburg, Germany
- Centre for Plant Molecular Biology, University of Tübingen, 72074 Tübingen, Germany
- Chemical Genomics Centre of the Max Planck Society, 44227 Dortmund, Germany
| | | | - Marilena Ohmer
- Centre for Plant Molecular Biology, University of Tübingen, 72074 Tübingen, Germany
| | | | - Udo Gowik
- Institute for Biology and Environmental Science, University of Oldenburg, 26129 Oldenburg, Germany
| | | | - Sascha Laubinger
- Institute for Biology and Environmental Science, University of Oldenburg, 26129 Oldenburg, Germany
- Centre for Plant Molecular Biology, University of Tübingen, 72074 Tübingen, Germany
- Chemical Genomics Centre of the Max Planck Society, 44227 Dortmund, Germany
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| |
Collapse
|
31
|
Weirick T, Militello G, Hosen MR, John D, Moore JB, Uchida S. Investigation of RNA Editing Sites within Bound Regions of RNA-Binding Proteins. High Throughput 2019; 8:ht8040019. [PMID: 31795425 PMCID: PMC6970233 DOI: 10.3390/ht8040019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/08/2019] [Accepted: 11/27/2019] [Indexed: 12/16/2022] Open
Abstract
Studies in epitranscriptomics indicate that RNA is modified by a variety of enzymes. Among these RNA modifications, adenosine to inosine (A-to-I) RNA editing occurs frequently in the mammalian transcriptome. These RNA editing sites can be detected directly from RNA sequencing (RNA-seq) data by examining nucleotide changes from adenosine (A) to guanine (G), which substitutes for inosine (I). However, a careful investigation of such nucleotide changes must be conducted to distinguish sequencing errors and genomic mutations from the genuine editing sites. Building upon our recent introduction of an easy-to-use bioinformatics tool, RNA Editor, to detect RNA editing events from RNA-seq data, we examined the extent by which RNA editing events affect the binding of RNA-binding proteins (RBP). Through employing bioinformatic techniques, we uncovered that RNA editing sites occur frequently in RBP-bound regions. Moreover, the presence of RNA editing sites are more frequent when RNA editing islands were examined, which are regions in which RNA editing sites are present in clusters. When the binding of one RBP, human antigen R [HuR; encoded by ELAV-like protein 1 (ELAV1)], was quantified experimentally, its binding was reduced upon silencing of the RNA editing enzyme adenosine deaminases acting on RNA (ADAR) compared to the control-suggesting that the presence of RNA editing islands influence HuR binding to its target regions. These data indicate RNA editing as an important mediator of RBP-RNA interactions-a mechanism which likely constitutes an additional mode of post-transcription gene regulation in biological systems.
Collapse
Affiliation(s)
- Tyler Weirick
- Cardiovascular Innovation Institute, University of Louisville, Louisville, KY 40202, USA
- RIKEN Center for Integrative Medical Sciences (IMS), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Giuseppe Militello
- Cardiovascular Innovation Institute, University of Louisville, Louisville, KY 40202, USA
- Department of Molecular Cellular and Developmental Biology, Yale University, Yale Science Building-260 Whitney Avenue, New Haven, CT 06511, USA;
| | - Mohammed Rabiul Hosen
- Department of Internal Medicine-II, Molecular Cardiology, Biomedical Center (BMZ), University of Bonn, Sigmund-Freud-Str. 25, Bonn 53127, Germany;
| | - David John
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main 60590, Germany;
| | - Joseph B. Moore
- The Christina Lee Brown Envirome Institute, Department of Medicine, University of Louisville, Louisville, KY 40202, USA;
- Diabetes and Obesity Center, University of Louisville, Louisville, KY 40202, USA
| | - Shizuka Uchida
- Cardiovascular Innovation Institute, University of Louisville, Louisville, KY 40202, USA
- The Christina Lee Brown Envirome Institute, Department of Medicine, University of Louisville, Louisville, KY 40202, USA;
- Diabetes and Obesity Center, University of Louisville, Louisville, KY 40202, USA
- Correspondence: ; Tel.: +1-502-854-0570
| |
Collapse
|
32
|
Wulf MG, Maguire S, Humbert P, Dai N, Bei Y, Nichols NM, Corrêa IR, Guan S. Non-templated addition and template switching by Moloney murine leukemia virus (MMLV)-based reverse transcriptases co-occur and compete with each other. J Biol Chem 2019; 294:18220-18231. [PMID: 31640989 PMCID: PMC6885630 DOI: 10.1074/jbc.ra119.010676] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 10/17/2019] [Indexed: 11/21/2022] Open
Abstract
Single-cell RNA-Seq (scRNA-Seq) has led to an unprecedented understanding of gene expression and regulation in individual cells. Many scRNA-Seq approaches rely upon the template switching property of Moloney murine leukemia virus (MMLV)-type reverse transcriptases. Template switching is believed to happen in a sequential process involving nontemplated addition of three protruding nucleotides (+CCC) to the 3′-end of the nascent cDNA, which can then anneal to the matching rGrGrG 3′-end of the template-switching oligo (TSO), allowing the reverse transcriptase (RT) to switch templates and continue copying the TSO sequence. In this study, we present a detailed analysis of template switching biases with respect to the RNA template, specifically of the role of the sequence and nature of its 5′-end (capped versus noncapped) in these biases. Our findings confirmed that the presence of a 5′-m7G cap enhances template switching efficiency. We also profiled the composition of the nontemplated addition in the absence of TSO and observed that the 5′-end of RNA template influences the terminal transferase activity of the RT. Furthermore, we found that designing new TSOs that pair with the most common nontemplated additions did little to improve template switching efficiency. Our results provide evidence suggesting that, in contrast to the current understanding of the template switching process, nontemplated addition and template switching are concurrent and competing processes.
Collapse
Affiliation(s)
| | - Sean Maguire
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | - Paul Humbert
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | - Nan Dai
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | - Yanxia Bei
- New England Biolabs, Inc., Ipswich, Massachusetts 01938
| | | | - Ivan R Corrêa
- New England Biolabs, Inc., Ipswich, Massachusetts 01938.
| | - Shengxi Guan
- New England Biolabs, Inc., Ipswich, Massachusetts 01938.
| |
Collapse
|
33
|
Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet 2019; 20:631-656. [DOI: 10.1038/s41576-019-0150-2] [Citation(s) in RCA: 679] [Impact Index Per Article: 135.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2019] [Indexed: 12/12/2022]
|
34
|
Jiang F, Zhang J, Liu Q, Liu X, Wang H, He J, Kang L. Long-read direct RNA sequencing by 5'-Cap capturing reveals the impact of Piwi on the widespread exonization of transposable elements in locusts. RNA Biol 2019; 16:950-959. [PMID: 30982421 PMCID: PMC6546357 DOI: 10.1080/15476286.2019.1602437] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 12/20/2022] Open
Abstract
The large genome of the migratory locust (Locusta migratoria) genome accumulates massive amount of accumulated transposable elements (TEs), which show intrinsic transcriptional activities. Hampering the ability to precisely determine full-length RNA transcript sequences are exonized TEs, which produce numerous highly similar fragments that are difficult to resolve using short-read sequencing technology. Here, we applied a 5'-Cap capturing method using Nanopore long-read direct RNA sequencing to characterize full-length transcripts in their native RNA form and to analyze the TE exonization pattern in the locust transcriptome. Our results revealed the widespread establishment of TE exonization and a substantial contribution of TEs to RNA splicing in the locust transcriptome. The results of the transcriptomic spectrum influenced by Piwi expression indicated that TE-derived sequences were the main targets of Piwi-mediated repression. Furthermore, our study showed that Piwi expression regulates the length of RNA transcripts containing TE-derived sequences, creating an alternative UTR usage. Overall, our results reveal the transcriptomic characteristics of TE exonization in the species characterized by large and repetitive genomes.
Collapse
Affiliation(s)
- Feng Jiang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jie Zhang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Qing Liu
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing, China
| | - Xiang Liu
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Huimin Wang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Jing He
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Le Kang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
35
|
Zhao L, Zhang H, Kohnen MV, Prasad KVSK, Gu L, Reddy ASN. Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing. Front Genet 2019; 10:253. [PMID: 30949200 PMCID: PMC6438080 DOI: 10.3389/fgene.2019.00253] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 03/06/2019] [Indexed: 12/18/2022] Open
Abstract
Nanopore sequencing from Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) single-molecule real-time (SMRT) long-read isoform sequencing (Iso-Seq) are revolutionizing the way transcriptomes are analyzed. These methods offer many advantages over most widely used high-throughput short-read RNA sequencing (RNA-Seq) approaches and allow a comprehensive analysis of transcriptomes in identifying full-length splice isoforms and several other post-transcriptional events. In addition, direct RNA-Seq provides valuable information about RNA modifications, which are lost during the PCR amplification step in other methods. Here, we present a comprehensive summary of important applications of these technologies in plants, including identification of complex alternative splicing (AS), full-length splice variants, fusion transcripts, and alternative polyadenylation (APA) events. Furthermore, we discuss the impact of the newly developed nanopore direct RNA-Seq in advancing epitranscriptome research in plants. Additionally, we summarize computational tools for identifying and quantifying full-length isoforms and other co/post-transcriptional events and discussed some of the limitations with these methods. Sequencing of transcriptomes using these new single-molecule long-read methods will unravel many aspects of transcriptome complexity in unprecedented ways as compared to previous short-read sequencing approaches. Analysis of plant transcriptomes with these new powerful methods that require minimum sample processing is likely to become the norm and is expected to uncover novel co/post-transcriptional gene regulatory mechanisms that control biological outcomes during plant development and in response to various stresses.
Collapse
Affiliation(s)
- Liangzhen Zhao
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hangxiao Zhang
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Markus V. Kohnen
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Kasavajhala V. S. K. Prasad
- Program in Cell and Molecular Biology, Department of Biology, Colorado State University, Fort Collins, CO, United States
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Anireddy S. N. Reddy
- Program in Cell and Molecular Biology, Department of Biology, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
36
|
Trotman JB, Schoenberg *DR. A recap of RNA recapping. WILEY INTERDISCIPLINARY REVIEWS. RNA 2019; 10:e1504. [PMID: 30252202 PMCID: PMC6294674 DOI: 10.1002/wrna.1504] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 07/13/2018] [Accepted: 08/01/2018] [Indexed: 12/12/2022]
Abstract
The N7-methylguanosine cap is a hallmark of the 5' end of eukaryotic mRNAs and is required for gene expression. Loss of the cap was believed to lead irreversibly to decay. However, nearly a decade ago, it was discovered that mammalian cells contain enzymes in the cytoplasm that are capable of restoring caps onto uncapped RNAs. In this review, we summarize recent advances in our understanding of cytoplasmic RNA recapping and discuss the biochemistry of this process and its impact on regulating and diversifying the transcriptome. Although most studies focus on mammalian RNA recapping, we also highlight new observations for recapping in disparate eukaryotic organisms, with the trypanosome recapping system appearing to be a fascinating example of convergent evolution. We conclude with emerging insights into the biological significance of RNA recapping and prospects for the future of this evolving area of study. This article is categorized under: RNA Processing > RNA Editing and Modification Translation > Translation Regulation RNA Processing > Capping and 5' End Modifications RNA Turnover and Surveillance > Regulation of RNA Stability.
Collapse
Affiliation(s)
- Jackson B. Trotman
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH 43210,
| | - *Daniel R. Schoenberg
- Department of Biological Chemistry and Pharmacology, Center for RNA Biology, The Ohio State University, Columbus, OH 43210, schoenberg,
| |
Collapse
|
37
|
Piriyapongsa J, Kaewprommal P, Vaiwsri S, Anuntakarun S, Wirojsirasak W, Punpee P, Klomsa-Ard P, Shaw PJ, Pootakham W, Yoocha T, Sangsrakru D, Tangphatsornruang S, Tongsima S, Tragoonrung S. Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing. PeerJ 2018; 6:e5818. [PMID: 30397543 PMCID: PMC6214230 DOI: 10.7717/peerj.5818] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 09/23/2018] [Indexed: 12/15/2022] Open
Abstract
Background Sugarcane is an important global food crop and energy resource. To facilitate the sugarcane improvement program, genome and gene information are important for studying traits at the molecular level. Most currently available transcriptome data for sugarcane were generated using second-generation sequencing platforms, which provide short reads. The de novo assembled transcripts from these data are limited in length, and hence may be incomplete and inaccurate, especially for long RNAs. Methods We generated a transcriptome dataset of leaf tissue from a commercial Thai sugarcane cultivar Khon Kaen 3 (KK3) using PacBio RS II single-molecule long-read sequencing by the Iso-Seq method. Short-read RNA-Seq data were generated from the same RNA sample using the Ion Proton platform for reducing base calling errors. Results A total of 119,339 error-corrected transcripts were generated with the N50 length of 3,611 bp, which is on average longer than any previously reported sugarcane transcriptome dataset. 110,253 sequences (92.4%) contain an open reading frame (ORF) of at least 300 bp long with ORF N50 of 1,416 bp. The mean lengths of 5′ and 3′ untranslated regions in 73,795 sequences with complete ORFs are 1,249 and 1,187 bp, respectively. 4,774 transcripts are putatively novel full-length transcripts which do not match with a previous Iso-Seq study of sugarcane. We annotated the functions of 68,962 putative full-length transcripts with at least 90% coverage when compared with homologous protein coding sequences in other plants. Discussion The new catalog of transcripts will be useful for genome annotation, identification of splicing variants, SNP identification, and other research pertaining to the sugarcane improvement program. The putatively novel transcripts suggest unique features of KK3, although more data from different tissues and stages of development are needed to establish a reference transcriptome of this cultivar.
Collapse
Affiliation(s)
- Jittima Piriyapongsa
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Pavita Kaewprommal
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sirintra Vaiwsri
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Songtham Anuntakarun
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | | | - Prapat Punpee
- Mitr Phol Sugarcane Research Center Co., Ltd., Chaiyaphum, Thailand
| | | | - Philip J Shaw
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Wirulda Pootakham
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Thippawan Yoocha
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Duangjai Sangsrakru
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sithichoke Tangphatsornruang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| | - Somvong Tragoonrung
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand
| |
Collapse
|
38
|
Tardaguila M, de la Fuente L, Marti C, Pereira C, Pardo-Palacios FJ, Del Risco H, Ferrell M, Mellado M, Macchietto M, Verheggen K, Edelmann M, Ezkurdia I, Vazquez J, Tress M, Mortazavi A, Martens L, Rodriguez-Navarro S, Moreno-Manzano V, Conesa A. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res 2018; 28:396-411. [PMID: 29440222 PMCID: PMC5848618 DOI: 10.1101/gr.222976.117] [Citation(s) in RCA: 224] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 01/08/2018] [Indexed: 01/15/2023]
Abstract
High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.
Collapse
Affiliation(s)
- Manuel Tardaguila
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Lorena de la Fuente
- Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), 46012 Valencia, Spain
| | - Cristina Marti
- Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), 46012 Valencia, Spain
| | - Cécile Pereira
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | | | - Hector Del Risco
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Marc Ferrell
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | | | - Marissa Macchietto
- Department of Developmental and Cell Biology, University of California, Irvine, California 92617, USA
| | - Kenneth Verheggen
- VIB-UGent Center for Medical Biotechnology, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Mariola Edelmann
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Iakes Ezkurdia
- Centro Nacional de Investigaciones Cardiovasculares CNIC, 28029 Madrid, Spain
| | - Jesus Vazquez
- Centro Nacional de Investigaciones Cardiovasculares CNIC, 28029 Madrid, Spain
| | - Michael Tress
- Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, California 92617, USA
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Susana Rodriguez-Navarro
- Gene Expression and mRNA Metabolism Laboratory, CSIC, IBV, 46010 Valencia, Spain
- Gene Expression and mRNA Metabolism Laboratory, CIPF, 46012 Valencia, Spain
| | | | - Ana Conesa
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
- Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), 46012 Valencia, Spain
| |
Collapse
|
39
|
Bayega A, Wang YC, Oikonomopoulos S, Djambazian H, Fahiminiya S, Ragoussis J. Transcript Profiling Using Long-Read Sequencing Technologies. Methods Mol Biol 2018; 1783:121-147. [PMID: 29767360 DOI: 10.1007/978-1-4939-7834-2_6] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
RNA sequencing using next-generation sequencing (NGS, RNA-Seq) technologies is currently the standard approach for gene expression profiling, particularly for large-scale high-throughput studies. NGS technologies comprise short-read RNA-Seq (dominated by Illumina) and long-read RNA-Seq technologies provided by Pacific Bioscience (PacBio) and Oxford Nanopore Technologies (ONT). Although short-read sequencing technologies are the most widely used, long-read technologies are increasingly becoming the standard approach for de novo transcriptome assembly and isoform expression quantification due to the complex nature of the transcriptome which consists of variable lengths of transcripts and multiple alternatively spliced isoforms for most genes. In this chapter, we describe experimental procedures for library preparation, sequencing, and associated data analysis approaches for PacBio and ONT with a major focus on full length cDNA synthesis, de novo transcriptome assembly, and isoform quantification.
Collapse
Affiliation(s)
- Anthony Bayega
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada
| | - Yu Chang Wang
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada
| | - Spyros Oikonomopoulos
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada
| | - Haig Djambazian
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada
| | - Somayyeh Fahiminiya
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada
- Cancer Research Program, The Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, McGill University and Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada.
- Department of Bioengineering, McGill University, Montréal, QC, Canada.
- Cancer and Mutagen Unit, King Fahd Center for Medical Research, Department of Biochemistry, Center of Innovation in Personalized Medicine, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|