1
|
Panzade KP, Tribhuvan KU, Pawar DV, Jasrotia RS, Gaikwad K, Dalal M, Kumar RR, Singh MP, Awasthi OP, Padaria JC. Discovering the regulators of heat stress tolerance in Ziziphus nummularia (Burm.f) wight and walk.-arn. Physiol Mol Biol Plants 2024; 30:497-511. [PMID: 38633271 PMCID: PMC11018567 DOI: 10.1007/s12298-024-01431-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 12/14/2023] [Accepted: 03/01/2024] [Indexed: 04/19/2024]
Abstract
Ziziphus nummularia an elite heat-stress tolerant shrub, grows in arid regions of desert. However, its molecular mechanism responsible for heat stress tolerance is unexplored. Therefore, we analysed whole transcriptome of Jaisalmer (heat tolerant) and Godhra (heat sensitive) genotypes of Z. nummularia to understand its molecular mechanism responsible for heat stress tolerance. De novo assembly of 16,22,25,052 clean reads yielded 276,029 transcripts. A total of 208,506 unigenes were identified which contains 4290 and 1043 differentially expressed genes (DEG) in TGO (treated Godhra at 42 °C) vs. CGO (control Godhra) and TJR (treated Jaisalmer at 42 °C) vs. CJR (control Jaisalmer), respectively. A total of 987 (67 highly enriched) and 754 (34 highly enriched) pathways were obsorved in CGO vs. TGO and CJR vs. TJR, respectively. Antioxidant pathways and TFs like Homeobox, HBP, ARR, PHD, GRAS, CPP, and E2FA were uniquely observed in Godhra genotype and SET domains were uniquely observed in Jaisalmer genotype. Further transposable elements were highly up-regulated in Godhra genotype but no activation in Jaisalmer genotype. A total of 43,093 and 39,278 simple sequence repeats were identified in the Godhra and Jaisalmer genotypes, respectively. A total of 10 DEGs linked to heat stress were validated in both genotypes for their expression under different heat stresses using quantitative real-time PCR. Comparing expression patterns of the selected DEGs identified ClpB1 as a potential candidate gene for heat tolerance in Z. nummularia. Here we present first characterized transcriptome of Z. nummularia in response to heat stress for the identification and characterization of heat stress-responsive genes. Supplementary Information The online version contains supplementary material available at 10.1007/s12298-024-01431-y.
Collapse
Affiliation(s)
- Kishor Prabhakar Panzade
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| | - Kishor U. Tribhuvan
- ICAR-Indian Institute of Agricultural Biotechnology, Ranchi, Jharkhand 834 003 India
| | - Deepak V. Pawar
- ICAR- Directorate of Weed Research, Maharajpur, Jabalpur, Madhya Pradesh 482004 India
| | - Rahul Singh Jasrotia
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 Delhi India
- University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr., San Antonio, TX 78229 USA
| | - Kishor Gaikwad
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| | - Monika Dalal
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| | - Ranjeet Ranjan Kumar
- Division of Biochemistry, ICAR–Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| | - Madan Pal Singh
- Division of Plant Physiology, ICAR-Indian Agrcultural Research Institute, New Delhi, 110 012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| | - Om Prakash Awasthi
- Division of Horticulture, ICAR-Indian Agrcultural Research Institute, New Delhi, 110 012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| | - Jasdeep Chatrath Padaria
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 Delhi India
- PG School, Indian Agricultural Research Institute, New Delhi, 110 012 Delhi India
| |
Collapse
|
2
|
Karnaneedi S, Limviphuvadh V, Maurer-Stroh S, Lopata AL. De Novo Transcriptomic Analyses to Identify and Compare Allergens in Foods. Methods Mol Biol 2024; 2717:351-365. [PMID: 37737997 DOI: 10.1007/978-1-0716-3453-0_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023]
Abstract
Food allergens have been traditionally identified using biomolecular and immunological approaches. However, the techniques used in extracting proteins from the food source to be analyzed may hinder the availability of all proteins when assessing immunological allergenicity. Additionally, depending on the number and pool of patient sera used to detect the IgE antibody-binding allergens, some allergens may not be detected if not all the patients in the pool are sensitized to all the allergens. To overcome these limitations, we describe an additional approach before the in vitro approaches, by analyzing the transcriptome in silico for all putative allergens within the analyzed food source.
Collapse
Affiliation(s)
- Shaymaviswanathan Karnaneedi
- Molecular Allergy Research Laboratory, College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia.
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, QLD, Australia.
- Centre for Food and Allergy Research, Murdoch Children's Research Institute, Melbourne, VIC, Australia.
| | - Vachiranee Limviphuvadh
- Biomolecular Function Discovery Division, Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore
- IFCS Programme, Singapore Institute for Food and Biotechnology Innovation, Agency for Science, Technology and Research, Singapore, Singapore
| | - Sebastian Maurer-Stroh
- Biomolecular Function Discovery Division, Bioinformatics Institute, Agency for Science, Technology and Research, Singapore, Singapore
- IFCS Programme, Singapore Institute for Food and Biotechnology Innovation, Agency for Science, Technology and Research, Singapore, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Andreas L Lopata
- Molecular Allergy Research Laboratory, College of Public Health, Medical and Veterinary Sciences, James Cook University, Townsville, QLD, Australia
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, QLD, Australia
- Centre for Food and Allergy Research, Murdoch Children's Research Institute, Melbourne, VIC, Australia
- Tropical Futures Institute, James Cook University Singapore, Singapore, Singapore
| |
Collapse
|
3
|
Acebal MC, Dalgaard LT, Jørgensen TS, Hansen BW. Embryogenesis of a calanoid copepod analyzed by transcriptomics. Comp Biochem Physiol Part D Genomics Proteomics 2023; 45:101054. [PMID: 36565589 DOI: 10.1016/j.cbd.2022.101054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/22/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022]
Abstract
The calanoid copepod Acartia tonsa (Dana) has attracted interest because of its use as a copepod model organism as well as its potential economic role as live fish larval feed. While the adult genome and transcriptome of A. tonsa has been investigated, no studies have been performed investigating the genome-wide transcriptional changes during the normal subitaneous embryogenesis. Thus, the aim of the current study was to investigate said transcriptional changes throughout A. tonsa embryonic development. RNA extraction and de novo transcriptome assembly for the subitaneous embryogenesis of the copepod was conducted. The assembly includes for the first-time samples describing quiescent development and overall helps establishing a framework for future studies on the molecular biology of our species of interest. Among the findings reported, sequences annotated to well-known developmental genes, were identified. At the same time are described the molecular changes and gene expression levels throughout the entire 42 h the embryonic development lasts. In conclusion, here we present the most complete genome-wide transcriptional map of early copepod embryonic development to date, enabling further use of A. tonsa as a model organism for crustacean development. Keywords: enrichment of pathways; subitaneous embryogenesis, comparative genomics; transcriptome assembly; invertebrate genomics.
Collapse
Affiliation(s)
- Miguel Cifuentes Acebal
- Department of Science and Environment, Roskilde University, Universitetsvej 1, DK-4000 Roskilde, Denmark
| | - Louise Torp Dalgaard
- Department of Science and Environment, Roskilde University, Universitetsvej 1, DK-4000 Roskilde, Denmark
| | - Tue Sparholt Jørgensen
- Department of Science and Environment, Roskilde University, Universitetsvej 1, DK-4000 Roskilde, Denmark; Department of Environmental Science - Environmental Microbiology and Biotechnology, Aarhus University, Frederiksborgvej 399, DK-4000 Roskilde, Denmark; The Novo Nordisk Foundation Center for Biosustainability (DTU Biosustain) at the Technical University of Denmark, Building 220, Kemitorvet, DK-2800 Kgs. Lyngby, Denmark(1)
| | - Benni Winding Hansen
- Department of Science and Environment, Roskilde University, Universitetsvej 1, DK-4000 Roskilde, Denmark.
| |
Collapse
|
4
|
Abstract
Polyploidizations, or whole-genome duplications (WGDs), in plants have increased biological complexity, facilitated evolutionary innovation, and likely enabled adaptation under harsh conditions. Besides genomic data, transcriptome data have been widely employed to detect WGDs, due to their efficient accessibility to the gene space of a species. Age distributions based on synonymous substitutions (so-called KS age distributions) for paralogs assembled from transcriptome data have identified numerous WGDs in plants, paving the way for further studies on the importance of WGDs for the evolution of seed and flowering plants. However, it is still unclear how transcriptome-based age distributions compare to those based on genomic data. In this chapter, we implemented three different de novo transcriptome assembly pipelines with two popular assemblers, i.e., Trinity and SOAPdenovo-Trans. We selected six plant species with published genomes and transcriptomes to evaluate how assembled transcripts from different pipelines perform when using KS distributions to detect previously documented WGDs in the six species. Further, using genes predicted in each genome as references, we evaluated the effects of missing genes, gene family clustering, and de novo assembled transcripts on the transcriptome-based KS distributions. Our results show that, although the transcriptome-based KS distributions differ from the genome-based ones with respect to their shapes and scales, they are still reasonably reliable for unveiling WGDs, except in species where most duplicates originated from a recent WGD. We also discuss how to overcome some possible pitfalls when using transcriptome data to identify WGDs.
Collapse
Affiliation(s)
- Jia Li
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.,VIB Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
| | - Zhen Li
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
| |
Collapse
|
5
|
Tribhuvan KU, Singh DK, Pradhan B, Bishi SK, Pandey A, Kumar S, Bhati J, Mishra DC, Das A, Sharma TR, Pattanayak A, Singh BK. Sequencing and de novo transcriptome assembly for discovering regulators of gene expression in Jack (Artocarpus heterophyllus). Genomics 2022; 114:110356. [PMID: 35364267 DOI: 10.1016/j.ygeno.2022.110356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/12/2022] [Accepted: 03/27/2022] [Indexed: 01/14/2023]
Abstract
Jack (Artocarpus heterophyllus) is a multipurpose fruit-tree species with minimal genomic resources. The study reports developing comprehensive transcriptome data containing 80,411 unigenes with an N50 value of 1265 bp. We predicted 64,215 CDSs from the unigenes and annotated and functionally categorized them into the biological process (23,230), molecular function (27,149), and cellular components (17,284). From 80,411 unigenes, we discovered 16,853 perfect SSRs with 192 distinct repeat motif types reiterating 4 to 22 times. Besides, we identified 2741 TFs from 69 TF families, 53 miRNAs from 19 conserved miRNA families, 25,953 potential lncRNAs, and placed three functional eTMs in different lncRNA-miRNA pairs. The regulatory networks involving genes, TFs, and miRNAs identified several regulatory and regulated nodes providing insight into miRNAs' gene associations and transcription factor-mediated regulation. The comparison of expression patterns of some selected miRNAs vis-à-vis their corresponding target genes showed an inverse relationship indicating the possible miRNA-mediated regulation of the genes.
Collapse
|
6
|
Ren X, Lv J, Liu M, Wang Q, Shao H, Liu P, Li J. A chromosome-level genome of the kuruma shrimp (Marsupenaeus japonicus) provides insights into its evolution and cold-resistance mechanism. Genomics 2022; 114:110373. [PMID: 35460816 DOI: 10.1016/j.ygeno.2022.110373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 04/05/2022] [Accepted: 04/18/2022] [Indexed: 01/14/2023]
Abstract
Marsupenaeus japonicus is an important marine crustacean species. However, a lack of genomic resources hinders the use of whole genome sequencing to explore their genetic basis and molecular mechanisms for genome-assisted breeding. Consequently, we determined the chromosome-level genome of M. japonicus. Here we determine the chromosome-level genome assembly for M. japonicus with a total of 665.19 Gb genomic sequencing data, yielding an approximately1.54 Gb assembly with a contig N50 size of 229.97 kb and a scaffold N50 size of 38.27 Mb. With the high-throughput chromosome conformation capture (Hi-C) technology, we anchored 18,019 contigs onto 42 pseudo-chromosomes, accounting for 99.40% of the total genome assembly. Analysis of the present M. japonicus genome revealed 24,317 protein-coding genes and a high proportion of repetitive sequences (61.56%). The high-quality genome assembly enabled the identification of genes associated with cold-stress and cold tolerance in kuruma shrimp through the comparison of eyestalk transcriptomes between the low temperature-stressed shrimp (10 °C) and normal temperature shrimp (28 °C). The genome assembly presented here could be useful in future studies to reveal the molecular mechanisms of M. japonicus in response to low temperature stress and the molecular assisted breeding of M. japonicus in low temperature.
Collapse
Affiliation(s)
- Xianyun Ren
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, PR China; Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, PR China
| | - Jianjian Lv
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, PR China; Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, PR China
| | - Meng Liu
- Novogene Bioinformatics Institute, Beijing, PR China
| | - Qiong Wang
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, PR China; Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, PR China
| | - Huixin Shao
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, PR China; Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, PR China; College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, PR China
| | - Ping Liu
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, PR China; Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, PR China.
| | - Jian Li
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, PR China; Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, PR China.
| |
Collapse
|
7
|
Suranjika S, Pradhan S, Nayak SS, Parida A. De novo transcriptome assembly and analysis of gene expression in different tissues of moth bean (Vigna aconitifolia) (Jacq.) Marechal. BMC Plant Biol 2022; 22:198. [PMID: 35428206 PMCID: PMC9013028 DOI: 10.1186/s12870-022-03583-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/04/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND The underutilized species Vigna aconitifolia (Moth Bean) is an important legume crop cultivated in semi-arid conditions and is valued for its seeds for their high protein content. It is also a popular green manure cover crop that offers many agronomic benefits including nitrogen fixation and soil nutrients. Despite its economic potential, genomic resources for this crop are scarce and there is limited knowledge on the developmental process of this plant at a molecular level. In the present communication, we have studied the molecular mechanisms that regulate plant development in V. aconitifolia, with a special focus on flower and seed development. We believe that this study will greatly enrich the genomic resources for this plant in form of differentially expressed genes, transcription factors, and genic molecular markers. RESULTS We have performed the de novo transcriptome assembly using six types of tissues from various developmental stages of Vigna aconitifolia (var. RMO-435), namely, leaves, roots, flowers, pods, and seed tissue in the early and late stages of development, using the Illumina NextSeq platform. We assembled the transcriptome to get 150938 unigenes with an average length of 937.78 bp. About 79.9% of these unigenes were annotated in public databases and 12839 of those unigenes showed a significant match in the KEGG database. Most of the unigenes displayed significant differential expression in the late stages of seed development as compared with leaves. We annotated 74082 unigenes as transcription factors and identified 12096 simple sequence repeats (SSRs) in the genic regions of V.aconitifolia. Digital expression analysis revealed specific gene activities in different tissues which were validated using Real-time PCR analysis. CONCLUSIONS The Vigna aconitifolia transcriptomic resources generated in this study provide foundational resources for gene discovery with respect to various developmental stages. This study provides the first comprehensive analysis revealing the genes involved in molecular as well as metabolic pathways that regulate seed development and may be responsible for the unique nutritive values of moth bean seeds. Hence, this study would serve as a foundation for characterization of candidate genes which would not only provide novel insights into understanding seed development but also provide resources for improved moth bean and related species genetic enhancement.
Collapse
Affiliation(s)
- Sandhya Suranjika
- Institute of Life Sciences (ILS), An autonomous Institute under Department of Biotechnology Government of India, NALCO Square, Bhubaneswar, Odisha India
- Department of Biotechnology, Kalinga Institute of Industrial Technology (KIIT), KIIT Road, Patia, Bhubaneswar, Odisha India
| | - Seema Pradhan
- Institute of Life Sciences (ILS), An autonomous Institute under Department of Biotechnology Government of India, NALCO Square, Bhubaneswar, Odisha India
| | - Soumya Shree Nayak
- Institute of Life Sciences (ILS), An autonomous Institute under Department of Biotechnology Government of India, NALCO Square, Bhubaneswar, Odisha India
- Department of Biotechnology, Kalinga Institute of Industrial Technology (KIIT), KIIT Road, Patia, Bhubaneswar, Odisha India
| | - Ajay Parida
- Institute of Life Sciences (ILS), An autonomous Institute under Department of Biotechnology Government of India, NALCO Square, Bhubaneswar, Odisha India
| |
Collapse
|
8
|
Konstantinov DK, Menzorov A, Krivenko O, Doroshkov AV. Isolation and transcriptome analysis of a biotechnologically promising Black Sea protist, Thraustochytrium aureum ssp. strugatskii. PeerJ 2022; 10:e12737. [PMID: 35287351 PMCID: PMC8917795 DOI: 10.7717/peerj.12737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 12/13/2021] [Indexed: 01/07/2023] Open
Abstract
Background Marine protists are an important part of the ocean ecosystem. They may possess unique sets of biosynthetic pathways and, thus, be promising model organisms for metabolic engineering for producing substances for the pharmaceutical, cosmetic, and perfume industries. Currently, full-genome data are available just for a limited number of protists hampering their use in biotechnology. Methods We characterized the morphology of a new cultured strain of Thraustochytriaceae isolated from the Black Sea ctenophore Beroe ovata using phase-contrast microscopy. Cell culture was performed in the FAND culture medium based on fetal bovine serum and DMEM. Phylogenetic analysis was performed using the 18S rRNA sequence. We also conducted a transcriptome assembly and compared the data with the closest species. Results The protist belongs to the genus Thraustochytrium based on the 18S rRNA sequence analysis. We designated the isolated protist as T. aureum ssp. strugatskii. The closest species with the genome assembly is Schizochytrium aggregatum. Transcriptome analysis revealed the majority of the fatty acid synthesis enzymes. Conclusion Our findings suggest that the T. aureum ssp. strugatskii is a promising candidate for biotechnological use. Together with the previously available, our data would allow the establishment of an accurate phylogeny of the family Thraustochytriaceae. Also, it could be a reference point for studying the evolution of the enzyme families.
Collapse
Affiliation(s)
- Dmitrii K. Konstantinov
- Novosibirsk State University, Novosibirsk, Russia,Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Aleksei Menzorov
- Novosibirsk State University, Novosibirsk, Russia,Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Olga Krivenko
- A.O. Kovalevsky Institute of Biology of the Southern Seas of RAS, Sevastopol, Russia
| | - Alexey V. Doroshkov
- Novosibirsk State University, Novosibirsk, Russia,Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia,Siberian Federal University, Krasnoyarsk, Russia
| |
Collapse
|
9
|
Shrestha AMS, B Guiao JE, R Santiago KC. Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment. BMC Genomics 2022; 23:97. [PMID: 35120462 PMCID: PMC8815227 DOI: 10.1186/s12864-021-08278-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 12/22/2021] [Indexed: 11/16/2022] Open
Abstract
Background RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional RNA-seq data analysis workflow requires constructing a de-novo transcriptome assembly and annotating it against a high-confidence protein database. The assembly serves as a reference for read mapping, and the annotation is necessary for functional analysis of genes found to be differentially expressed. However, assembly is computationally expensive. It is also prone to errors that impact expression analysis, especially since sequencing depth is typically much lower for expression studies than for transcript discovery. Results We propose a shortcut, in which we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the high-confidence proteome that would have been otherwise used for annotation. By avoiding assembly, we drastically cut down computational costs – the running time on a typical dataset improves from the order of tens of hours to under half an hour, and the memory requirement is reduced from the order of tens of Gbytes to tens of Mbytes. We show through experiments on simulated and real data that our pipeline not only reduces computational costs, but has higher sensitivity and precision than a typical assembly-based pipeline. A Snakemake implementation of our workflow is available at: https://bitbucket.org/project_samar/samar. Conclusions The flip side of RNA-seq becoming accessible to even modestly resourced labs has been that the time, labor, and infrastructure cost of bioinformatics analysis has become a bottleneck. Assembly is one such resource-hungry process, and we show here that it can be avoided for quick and easy, yet more sensitive and precise, differential gene expression analysis in non-model organisms. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-08278-7).
Collapse
Affiliation(s)
- Anish M S Shrestha
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines. .,Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines.
| | - Joyce Emlyn B Guiao
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines.,Department of Mathematics and Statistics, College of Science, De La Salle University, Manila, Philippines
| | - Kyle Christian R Santiago
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines.,Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| |
Collapse
|
10
|
Sewe SO, Silva G, Sicat P, Seal SE, Visendi P. Trimming and Validation of Illumina Short Reads Using Trimmomatic, Trinity Assembly, and Assessment of RNA-Seq Data. Methods Mol Biol 2022; 2443:211-232. [PMID: 35037208 DOI: 10.1007/978-1-0716-2067-0_11] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Next-generation sequencing (NGS) technologies can generate billions of reads in a single sequencing run. However, with such high-throughput comes quality issues which have to be addressed before undertaking downstream analysis. Quality control on short reads is usually performed at default settings due to a lack of in-depth understanding of a particular software's parameters and their effect if changed on the output. Here we demonstrate how to optimize read trimming using Trimmomatic. We highlight the benefits of trimming by comparing the quality of transcripts assembled using trimmed and untrimmed reads.
Collapse
Affiliation(s)
- Steven O Sewe
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Gonçalo Silva
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Paulo Sicat
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Susan E Seal
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Paul Visendi
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
11
|
Abstract
In this chapter, we describe methods for analyzing RNA-Seq data, presented as a flow along a pipeline beginning with raw data from a sequencer and ending with an output of differentially expressed genes and their functional characterization. The first section covers de novo transcriptome assembly for organisms lacking reference genomes or for those interested in probing against the background of organism-specific transcriptomes assembled from RNA-Seq data. Section 2 covers both gene- and transcript-level quantifications, leading to the third and final section on differential expression analysis between two or more conditions. The pipeline starts with raw sequence reads, followed by quality assessment and preprocessing of the input data to ensure a robust estimate of the transcripts and their differential regulation. The preprocessed data can be inputted into the de novo transcriptome flow to assemble transcripts, functionally annotated using tools such as InterProScan or Blast2Go and then forwarded to differential expression analysis flow, or directly inputted into the differential expression analysis flow if a reference genome is available. An online repository containing sample data has also been made available, as well as custom Python scripts to modify the output of the programs within the pipeline for various downstream analyses.
Collapse
Affiliation(s)
- David J Burks
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA.
- Department of Mathematics, University of North Texas, Denton, TX, USA.
| |
Collapse
|
12
|
Langa J, Huret M, Montes I, Conklin D, Estonba A. Transcriptomic dataset for Sardina pilchardus: Assembly, annotation, and expression of nine tissues. Data Brief 2021; 39:107583. [PMID: 34849383 PMCID: PMC8609138 DOI: 10.1016/j.dib.2021.107583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/27/2021] [Accepted: 11/09/2021] [Indexed: 11/16/2022] Open
Abstract
European sardine or pilchard is a planktonic small pelagic fish present from the North Sea in Europe to the coast of Senegal in the North of Africa, and across the Mediterranean sea to the Black Sea. Ecologically, sardines are an intermediary link in the trophic network, preying on plankton and being predated by larger fishes, marine mammals, and seabirds. This species is of great nutritional and economic value as a cheap but rich source of protein and fat. It is either consumed directly by humans or fed as fishmeal for aquaculture and farm animals. Despite its importance in the food basket, little is known about the molecular mechanisms involved in protein and lipid synthesis in this species. We collected nine tissues of Sardina pilchardus and reconstructed the transcriptome. In all, 198,597 transcripts were obtained, from which 68,031 are protein-coding. Quality assessment of the transcriptome was performed by back-mapping reads to the transcriptome and by searching for Single Copy Orthologs. Additionally, Gene Ontology and KEGG annotations were retrieved for most of the protein-coding genes. Finally, each library was quantified in terms of Transcripts per Million to disclose their expression patterns.
Collapse
Affiliation(s)
- Jorge Langa
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country, UPV/EHU, Leioa, Bizkaia 48940, Spain
| | - Martin Huret
- IFREMER, STH/LBH, B.P. 70, Plouzané 29280 France
| | - Iratxe Montes
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country, UPV/EHU, Leioa, Bizkaia 48940, Spain
| | - Darrell Conklin
- Department of Computer Science and Artificial Intelligence, Faculty of Computer Science, University of the Basque Country UPV/EHU, San Sebastián, Spain.,IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| | - Andone Estonba
- Department of Genetics, Physical Anthropology and Animal Physiology, Faculty of Science and Technology, University of the Basque Country, UPV/EHU, Leioa, Bizkaia 48940, Spain
| |
Collapse
|
13
|
Taheri-Dehkordi A, Naderi R, Martinelli F, Salami SA. Computational screening of miRNAs and their targets in saffron (Crocus sativus L.) by transcriptome mining. Planta 2021; 254:117. [PMID: 34751821 DOI: 10.1007/s00425-021-03761-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 10/18/2021] [Indexed: 06/13/2023]
Abstract
A robust workflow for the identification of miRNAs and their targets in saffron was developed. MicroRNA-mediated gene regulation in saffron is potentially involved in several biological processes, including the biosynthesis of highly valuable apocarotenoids. Saffron (Crocus sativus L.) is the most expensive spice in the world and a major source of apocarotenoids. Even though miRNAs (20-24 nt non-coding small RNAs) are important regulators of gene expression at transcriptional and post-transcriptional levels, their role in saffron has not been thoroughly investigated. As a result, a workflow for computational identification of miRNAs and their targets can be useful to uncover the regulatory networks underlying biological processes in this valuable plant. The efficiency of several assembly tools such as Trans-ABySS, Trinity, Bridger, rnaSPAdes, and EvidentialGene was evaluated based on both reference-based and reference-free metrics using transcriptome data. A reliable workflow for computational identification of miRNAs and their targets in saffron was described. The EvidentialGene was found to be the most efficient de novo transcriptome assembler for saffron as a complex triploid model, followed by the Trinity. In total, 66 miRNAs from 19 different families that target 2880 genes, including several transcription factors involved in the flowering transition, were identified. Three of the identified targets were involved in the terpenoids backbone biosynthesis. CsCCD and CsUGT genes involved in the apocarotenoids biosynthetic pathway were targeted by csa-miR156g and csa-miR156b-3p, revealing a unique post-transcriptional regulation dynamic in saffron. The identified miRNAs and their targets add to our understanding of the many biological roles of miRNAs in saffron and shed new light on the control of the apocarotenoid biosynthetic pathway in this valuable plant.
Collapse
Affiliation(s)
- Ayat Taheri-Dehkordi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj, Iran
| | - Roohangiz Naderi
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj, Iran
| | | | - Seyed Alireza Salami
- Department of Horticultural Science, Faculty of Agricultural Science and Engineering, University of Tehran, Karaj, Iran.
| |
Collapse
|
14
|
Ma X, Tang K, Tang Z, Dong A, Meng Y, Wang P. Organ-specific, integrated omics data-based study on the metabolic pathways of the medicinal plant Bletilla striata (Orchidaceae). BMC Plant Biol 2021; 21:504. [PMID: 34724893 PMCID: PMC8559373 DOI: 10.1186/s12870-021-03288-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 10/22/2021] [Indexed: 05/10/2023]
Abstract
BACKGROUND Bletilla striata is one of the important species belonging to the Bletilla genus of Orchidaceae. Since its extracts have an astringent effect on human tissues, B. striata is widely used for hemostasis and healing. Recently, some other beneficial effects have also been uncovered, such as antioxidation, antiinflammation, antifibrotic, and immunomodulatory activities. As a key step towards a thorough understanding on the medicinal ingredient production in B. striata, deciphering the regulatory codes of the metabolic pathways becomes a major task. RESULTS In this study, three organs (roots, tubers and leaves) of B. striata were analyzed by integrating transcriptome sequencing and untargeted metabolic profiling data. Five different metabolic pathways, involved in polysaccharide, sterol, flavonoid, terpenoid and alkaloid biosynthesis, were investigated respectively. For each pathway, the expression patterns of the enzyme-coding genes and the accumulation levels of the metabolic intermediates were presented in an organ-specific way. Furthermore, the relationships between enzyme activities and the levels of the related metabolites were partially inferred. Within the biosynthetic pathways of polysaccharides and flavonoids, long-range phytochemical transportation was proposed for certain metabolic intermediates and/or the enzymes. CONCLUSIONS The data presented by this work could strengthen the molecular basis for further studies on breeding and medicinal uses of B. striata.
Collapse
Affiliation(s)
- Xiaoxia Ma
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou, 310014, China
- School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Kehua Tang
- Key Laboratory of Hunan Forest Products and Chemical Industry Engineering, Jishou University, Zhangjiajie, 427000, China.
| | - Zhonghai Tang
- College of Food Science and Technology, Hunan Agricultural University, Changsha, 410128, China
| | - Aiwen Dong
- Key Laboratory of Hunan Forest Products and Chemical Industry Engineering, Jishou University, Zhangjiajie, 427000, China
| | - Yijun Meng
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 311121, China
| | - Pu Wang
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou, 310014, China.
| |
Collapse
|
15
|
Voshall A, Behera S, Li X, Yu XH, Kapil K, Deogun JS, Shanklin J, Cahoon EB, Moriyama EN. A consensus-based ensemble approach to improve transcriptome assembly. BMC Bioinformatics 2021; 22:513. [PMID: 34674629 PMCID: PMC8532302 DOI: 10.1186/s12859-021-04434-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 10/10/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. RESULTS In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. CONCLUSIONS Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/ .
Collapse
Affiliation(s)
- Adam Voshall
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital/Harvard Medical School, Boston, MA, 02115, USA
| | - Sairam Behera
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Xiangjun Li
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Xiao-Hong Yu
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Kushagra Kapil
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - John Shanklin
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Edgar B Cahoon
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA. .,Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
| |
Collapse
|
16
|
Shrestha AMS, I Lilagan CA, B Guiao JE, R Romana-Eguia MR, Ablan Lagman MC. Comparative transcriptome profiling of heat stress response of the mangrove crab Scylla serrata across sites of varying climate profiles. BMC Genomics 2021; 22:580. [PMID: 34325654 PMCID: PMC8323281 DOI: 10.1186/s12864-021-07891-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 07/14/2021] [Indexed: 11/30/2022] Open
Abstract
Background The fishery and aquaculture of the widely distributed mangrove crab Scylla serrata is a steadily growing, high-value, global industry. Climate change poses a risk to this industry as temperature elevations are expected to threaten the mangrove crab habitat and the supply of mangrove crab juveniles from the wild. It is therefore important to understand the genomic and molecular basis of how mangrove crab populations from sites with different climate profiles respond to heat stress. Towards this, we performed RNA-seq on the gill tissue of S. serrata individuals sampled from 3 sites (Cagayan, Bicol, and Bataan) in the Philippines, under normal and heat-stressed conditions. To compare the transcriptome expression profiles, we designed a 2-factor generalized linear model containing interaction terms, which allowed us to simultaneously analyze within-site response to heat-stress and across-site differences in the response. Results We present the first ever transcriptome assembly of S. serrata obtained from a data set containing 66 Gbases of cleaned RNA-seq reads. With lowly-expressed and short contigs excluded, the assembly contains roughly 17,000 genes with an N50 length of 2,366 bp. Our assembly contains many almost full-length transcripts – 5229 shrimp and 3049 fruit fly proteins have alignments that cover >80% of their sequence lengths to a contig. Differential expression analysis found population-specific differences in heat-stress response. Within-site analysis of heat-stress response showed 177, 755, and 221 differentially expressed (DE) genes in the Cagayan, Bataan, and Bicol group, respectively. Across-site analysis showed that between Cagayan and Bataan, there were 389 genes associated with 48 signaling and stress-response pathways, for which there was an effect of site in the response to heat; and between Cagayan and Bicol, there were 101 such genes affecting 8 pathways. Conclusion In light of previous work on climate profiling and on population genetics of marine species in the Philippines, our findings suggest that the variation in thermal response among populations might be derived from acclimatory plasticity due to pre-exposure to extreme temperature variations or from population structure shaped by connectivity which leads to adaptive genetic differences among populations. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07891-w).
Collapse
Affiliation(s)
- Anish M S Shrestha
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines. .,Software Technology Department, College of Computer Studies, De La Salle University, Manila, Philippines.
| | - Crissa Ann I Lilagan
- Software Technology Department, College of Computer Studies, De La Salle University, Manila, Philippines.,Practical Genomics Laboratory, Center for Natural Science and Environment Research, De La Salle University, Manila, Philippines.,Department of Biological Sciences, College of Science, University of Santo Tomas, Manila, Philippines
| | - Joyce Emlyn B Guiao
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines.,Mathematics and Statistics Department, College of Science, De La Salle University, Manila, Philippines
| | - Maria Rowena R Romana-Eguia
- Aquaculture Department, Southeast Asian Fisheries Development Center, Binangoan, 1940 Rizal, Philippines.,Biology Department, College of Science, De La Salle University, Manila, Philippines
| | - Ma Carmen Ablan Lagman
- Practical Genomics Laboratory, Center for Natural Science and Environment Research, De La Salle University, Manila, Philippines.,Biology Department, College of Science, De La Salle University, Manila, Philippines
| |
Collapse
|
17
|
Luo H, Bu D, Shao L, Li Y, Sun L, Wang C, Wang J, Yang W, Yang X, Dong J, Zhao Y, Li F. Single-cell Long Non-coding RNA Landscape of T Cells in Human Cancer Immunity. Genomics Proteomics Bioinformatics 2021; 19:377-393. [PMID: 34284134 PMCID: PMC8864193 DOI: 10.1016/j.gpb.2021.02.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 12/03/2020] [Accepted: 03/06/2021] [Indexed: 01/08/2023]
Abstract
The development of new biomarkers or therapeutic targets for cancer immunotherapies requires deep understanding of T cells. To date, the complete landscape and systematic characterization of long noncoding RNAs (lncRNAs) in T cells in cancer immunity are lacking. Here, by systematically analyzing full-length single-cell RNA sequencing (scRNA-seq) data of more than 20,000 libraries of T cells across three cancer types, we provided the first comprehensive catalog and the functional repertoires of lncRNAs in human T cells. Specifically, we developed a custom pipeline for de novotranscriptome assembly and obtained a novel lncRNA catalog containing 9433 genes. This increased the number of current human lncRNA catalog by 16% and nearly doubled the number of lncRNAs expressed in T cells. We found that a portion of expressed genes in single T cells were lncRNAs which had been overlooked by the majority of previous studies. Based on metacell maps constructed by the MetaCell algorithm that partitions scRNA-seq datasets into disjointed and homogenous groups of cells (metacells), 154 signature lncRNA genes were identified. They were associated with effector, exhausted, and regulatory T cell states. Moreover, 84 of them were functionally annotated based on the co-expression networks, indicating that lncRNAs might broadly participate in the regulation of T cell functions. Our findings provide a new point of view and resource for investigating the mechanisms of T cell regulation in cancer immunity as well as for novel cancer-immune biomarker development and cancer immunotherapies
Collapse
Affiliation(s)
- Haitao Luo
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China; Integrated Chinese and Western Medicine Postdoctoral Research Station, Jinan University, Guangzhou 510632, China.
| | - Dechao Bu
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Advanced Computing Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Lijuan Shao
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China; Integrated Chinese and Western Medicine Postdoctoral Research Station, Jinan University, Guangzhou 510632, China
| | - Yang Li
- Department of Gastrointestinal Surgery, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China
| | - Liang Sun
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Advanced Computing Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Ce Wang
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China
| | - Jing Wang
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China; Integrated Chinese and Western Medicine Postdoctoral Research Station, Jinan University, Guangzhou 510632, China
| | - Wei Yang
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China
| | - Xiaofei Yang
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China
| | - Jun Dong
- Integrated Chinese and Western Medicine Postdoctoral Research Station, Jinan University, Guangzhou 510632, China.
| | - Yi Zhao
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Advanced Computing Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
| | - Furong Li
- Translational Medicine Collaborative Innovation Center, The Second Clinical Medical College (Shenzhen People's Hospital), Jinan University, Shenzhen 518020, China; Shenzhen Key Laboratory of Stem Cell Research and Clinical Transformation, Shenzhen 518020, China.
| |
Collapse
|
18
|
Sakamoto T, Sasaki S, Yamaguchi N, Nakano M, Sato H, Iwabuchi K, Tabunoki H, Simpson RJ, Bono H. De novo transcriptome analysis for examination of the nutrition metabolic system related to the evolutionary process through which stick insects gain the ability of flight (Phasmatodea). BMC Res Notes 2021; 14:182. [PMID: 33985569 PMCID: PMC8120901 DOI: 10.1186/s13104-021-05600-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 05/05/2021] [Indexed: 12/02/2022] Open
Abstract
Objective Insects are the most evolutionarily successful groups of organisms, and this success is largely due to their flight ability. Interestingly, some stick insects have lost their flight ability despite having wings. To elucidate the shift from wingless to flying forms during insect evolution, we compared the nutritional metabolism system among flight-winged, flightless-winged, and flightless-wingless stick insect groups. Results Here, we report RNA sequencing of midgut transcriptome of Entoria okinawaensis, a prominent Japanese flightless-wingless stick insect, and the comparative analysis of its transcriptome in publicly available midgut transcriptomes obtained from seven stick insect species. A gene enrichment analysis for differentially expressed genes, including those obtained from winged vs wingless and flight vs flightless genes comparisons, revealed that carbohydrate metabolic process-related genes were highly expressed in the winged stick insect group. We also found that the expression of the mitochondrial enolase superfamily member 1 transcript was significantly higher in the winged stick insect group than in the wingless stick insect group. Our findings could indicate that carbohydrate metabolic processes are related to the evolutionary process through which stick insects gain the ability of flight. Supplementary Information The online version contains supplementary material available at 10.1186/s13104-021-05600-0.
Collapse
Affiliation(s)
- Takuma Sakamoto
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan.,Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Shunya Sasaki
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Nobuki Yamaguchi
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Miho Nakano
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Hiroki Sato
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Kikuo Iwabuchi
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Hiroko Tabunoki
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan.,Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Richard J Simpson
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan.,Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science (LIMS), La Trobe University, Melbourne, VIC, 3086, Australia
| | - Hidemasa Bono
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems (ROIS), Mishima, Shizuoka, 411-8540, Japan. .,Program of Biomedical Science, Graduate School of Integrated Sciences for Life, Hiroshima University, 3-10-23 Kagamiyama, Higashi-Hiroshima, Hiroshima, 739-0046, Japan.
| |
Collapse
|
19
|
Jiao X, Shi J, Qin S, Huang D, Wang Y. Dataset of the transcriptomes of Urechis unicinctus to identify differentially expressed genes (DEGs) under different temperature and exposure to open air. Data Brief 2021; 35:106941. [PMID: 33842678 PMCID: PMC8020418 DOI: 10.1016/j.dib.2021.106941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/22/2021] [Accepted: 03/03/2021] [Indexed: 10/29/2022] Open
Abstract
Urechis unicinctus has a wide range of bioactive polypeptides with high edible, economic and medicinal values. As the key technical breakthrough, the artificial breeding is imperative. However, the seedling transport becomes a primary matter, which indicates the indispensability of realizing how Urechis unicinctus responses to various situations. We compared transcriptome of Urechis unicinctus under the dry and ultraviolet irradiation treatment and different temperature. The dataset of the organism in response to water-temperature variety was provided by using the Illumina Hiseq X Ten system, which will be helpful to understand the adaptation of Urechis unicinctus to changing temperature (low, high and room temperature) and open air (ultraviolet and desiccation). The assembly of the transcriptomes was carried out using the isoform sequencing (Iso-seq) method. The functions of expressed genes were annotated and categorized, while the DEGs were presented.
Collapse
Affiliation(s)
- Xudong Jiao
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
| | - Jiaxin Shi
- College of Oceanic and Atmospheric Sciences, Ocean University of China, Qingdao 266000, China
| | - Song Qin
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
| | - Dong Huang
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Agronomy College, Rudong University, Shandong, Yantai 264025, China
| | - Yinchu Wang
- Key Laboratory of Coastal Biology and Biological Resources Utilization, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai 264003, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao 266071, China
| |
Collapse
|
20
|
Amil-Ruiz F, Herruzo-Ruiz AM, Fuentes-Almagro C, Baena-Angulo C, Jiménez-Pastor JM, Blasco J, Alhama J, Michán C. Constructing a de novo transcriptome and a reference proteome for the bivalve Scrobicularia plana: Comparative analysis of different assembly strategies and proteomic analysis. Genomics 2021; 113:1543-53. [PMID: 33774165 DOI: 10.1016/j.ygeno.2021.03.025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 03/17/2021] [Accepted: 03/21/2021] [Indexed: 11/20/2022]
Abstract
Scrobicularia plana is a coastal and estuarine bivalve widely used in ecotoxicological studies. However, the underlying molecular mechanisms for S. plana pollutant responses are hardly known due to the lack of molecular databases. Thus, in this study we present a holistic approach to assess a robust reference transcriptome and proteome of this clam. A mixture of control and metal-exposed individuals was used for mRNA isolation. Four sets of high quality filtered preprocessed reads were generated (two quality scores and two sequenced lengths) and assembled with Mira, Ray and Trinity algorithms. The sixty-four generated assemblies were refined, filtered and evaluated for their proteomic quality. Eight assemblies presented top Detonate scores but one was selected due to its compactness and biological representation, which was generated: (i) from the highest quality dataset (Q20L100), (ii) using Trinity algorithm with all k-mers (AtKa), (iii) removing redundancy by CD-HIT (RR80), and (iv) filtering out poor contigs (F), that was subsequently named Q20L100AtKaRR80F. S. plana proteomic analysis revealed 10,017 peptide groups that corresponded to 2066 proteins with a wide coverage of molecular functions and biological processes, confirming the strength of the database generated.
Collapse
|
21
|
Parker MT, Knop K, Barton GJ, Simpson GG. 2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing. Genome Biol 2021; 22:72. [PMID: 33648554 PMCID: PMC7919322 DOI: 10.1186/s13059-021-02296-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 02/10/2021] [Indexed: 01/04/2023] Open
Abstract
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools (https://github.com/bartongroup/2passtools), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.
Collapse
Affiliation(s)
- Matthew T Parker
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK.
| | - Katarzyna Knop
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| | - Geoffrey J Barton
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| | - Gordon G Simpson
- School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK. .,James Hutton Institute, Invergowrie, DD2 5DA, UK.
| |
Collapse
|
22
|
Shelke RG, Basak S, Rangan L. Development of EST-SSR markers for Pongamia pinnata by transcriptome database mining: cross-species amplification and genetic diversity. Physiol Mol Biol Plants 2020; 26:2225-2241. [PMID: 33268925 PMCID: PMC7688882 DOI: 10.1007/s12298-020-00889-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 07/21/2020] [Accepted: 09/30/2020] [Indexed: 06/12/2023]
Abstract
EST-SSR markers were developed from Pongamia pinnata transcriptome libraries. We have successfully utilised EST-SSRs to study the genetic diversity of Indian P. pinnata germplasms and transferability study on legume plants. P. pinnata is a non-edible oil, seed-bearing leguminous tree well known for its multipurpose benefits and acts as a potential source for medicine and biodiesel preparation. Moreover, the plant is not grazable by animal and wildly grown in different agro climatic condition of India. Recently, it is much used in reforestation and rehabilitation of marginal and coal mined land in different part of India. Due to increasing demand for cultivation, understanding of the genetic diversity is important parameter for further breeding and cultivation program. In this investigation, an attempt has been undertaken to develop novel EST-SSR markers by analyzing the assembled transcriptome from previously published Illumina libraries of P. pinnata, which is cross transferrable to legume plants. Twenty EST-SSR markers were developed from oil yielding and secondary metabolite biosynthesis genes. To our knowledge, this is the first EST-SSR marker based genetic diversity study on Indian P. pinnata germplasms. The genetic diversity parameter analysis of P. pinnata showed that the Gangetic plain and Eastern India are highly diverse compared to the Central Deccan and Western germplasms. The lowest genetic diversity in the Western region may be due to the pressure of lower precipitation, high-temperature stress and reduced groundwater availability. Nevertheless, the highest genetic diversity of Gangetic plain and Eastern India may be due to the higher groundwater availability, high precipitation, higher temperature fluctuations and growing by the side of glacier-fed river water. Thus, our study shows the evidence of natural selection on the genetic diversity of P. pinnata germplasms of the Indian subcontinent.
Collapse
Affiliation(s)
- Rahul G. Shelke
- Applied Biodiversity Lab, Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781 039 India
| | - Supriyo Basak
- Department of Bioscience and Biotechnology, Banasthali Vidyapith, Vanasthali, Rajasthan 304 022 India
| | - Latha Rangan
- Applied Biodiversity Lab, Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781 039 India
| |
Collapse
|
23
|
Chiu YL, Shikina S, Yoshioka Y, Shinzato C, Chang CF. De novo transcriptome assembly from the gonads of a scleractinian coral, Euphyllia ancora: molecular mechanisms underlying scleractinian gametogenesis. BMC Genomics 2020; 21:732. [PMID: 33087060 PMCID: PMC7579821 DOI: 10.1186/s12864-020-07113-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/29/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Sexual reproduction of scleractinians has captured the attention of researchers and the general public for decades. Although extensive ecological data has been acquired, underlying molecular and cellular mechanisms remain largely unknown. In this study, to better understand mechanisms underlying gametogenesis, we isolated ovaries and testes at different developmental phases from a gonochoric coral, Euphyllia ancora, and adopted a transcriptomic approach to reveal sex- and phase-specific gene expression profiles. In particular, we explored genes associated with oocyte development and maturation, spermiogenesis, sperm motility / capacitation, and fertilization. RESULTS 1.6 billion raw reads were obtained from 24 gonadal samples. De novo assembly of trimmed reads, and elimination of contigs derived from symbiotic dinoflagellates (Symbiodiniaceae) and other organisms yielded a reference E. ancora gonadal transcriptome of 35,802 contigs. Analysis of 4 developmental phases identified 2023 genes that were differentially expressed during oogenesis and 678 during spermatogenesis. In premature/mature ovaries, 631 genes were specifically upregulated, with 538 in mature testes. Upregulated genes included those involved in gametogenesis, gamete maturation, sperm motility / capacitation, and fertilization in other metazoans, including humans. Meanwhile, a large number of genes without homology to sequences in the SWISS-PROT database were also observed among upregulated genes in premature / mature ovaries and mature testes. CONCLUSIONS Our findings show that scleractinian gametogenesis shares many molecular characteristics with that of other metazoans, but it also possesses unique characteristics developed during cnidarian and/or scleractinian evolution. To the best of our knowledge, this study is the first to create a gonadal transcriptome assembly from any scleractinian. This study and associated datasets provide a foundation for future studies regarding gametogenesis and differences between male and female colonies from molecular and cellular perspectives. Furthermore, our transcriptome assembly will be a useful reference for future development of sex-specific and/or stage-specific germ cell markers that can be used in coral aquaculture and ecological studies.
Collapse
Affiliation(s)
- Yi-Ling Chiu
- Doctoral Program in Marine Biotechnology, National Taiwan Ocean University, Keelung, 20224, Taiwan.,Doctoral Program in Marine Biotechnology, Academia Sinica, Taipei, 11529, Taiwan
| | - Shinya Shikina
- Institute of Marine Environment and Ecology, National Taiwan Ocean University, Keelung, Taiwan. .,Center of Excellence for the Oceans, National Taiwan Ocean University, 2 Pei-Ning Rd, Keelung, 20224, Taiwan.
| | - Yuki Yoshioka
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, 277-8564, Japan
| | - Chuya Shinzato
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, 277-8564, Japan.
| | - Ching-Fong Chang
- Center of Excellence for the Oceans, National Taiwan Ocean University, 2 Pei-Ning Rd, Keelung, 20224, Taiwan. .,Department of Aquaculture, National Taiwan Ocean University, Keelung, Taiwan.
| |
Collapse
|
24
|
Hartmann S, Preick M, Abelt S, Scheffel A, Hofreiter M. Annotated genome sequences of the carnivorous plant Roridula gorgonias and a non-carnivorous relative, Clethra arborea. BMC Res Notes 2020; 13:426. [PMID: 32912303 PMCID: PMC7488092 DOI: 10.1186/s13104-020-05254-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 08/24/2020] [Indexed: 11/21/2022] Open
Abstract
Objective Plant carnivory is distributed across the tree of life and has evolved at least six times independently, but sequenced and annotated nuclear genomes of carnivorous plants are currently lacking. We have sequenced and structurally annotated the nuclear genome of the carnivorous Roridula gorgonias and that of a non-carnivorous relative, Madeira’s lily-of-the-valley-tree, Clethra arborea, both within the Ericales. This data adds an important resource to study the evolutionary genetics of plant carnivory across angiosperm lineages and also for functional and systematic aspects of plants within the Ericales. Results Our assemblies have total lengths of 284 Mbp (R. gorgonias) and 511 Mbp (C. arborea) and show high BUSCO scores of 84.2% and 89.5%, respectively. We used their predicted genes together with publicly available data from other Ericales’ genomes and transcriptomes to assemble a phylogenomic data set for the inference of a species tree. However, groups of orthologs showed a marked absence of species represented by a transcriptome. We discuss possible reasons and caution against combining predicted genes from genome- and transriptome-based assemblies.
Collapse
Affiliation(s)
- Stefanie Hartmann
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany.
| | - Michaela Preick
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| | - Silke Abelt
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| | - André Scheffel
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam, Germany
| | - Michael Hofreiter
- Institute for Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476, Potsdam, Germany
| |
Collapse
|
25
|
McGarvey P, Huang J, McCoy M, Orvis J, Katsir Y, Lotringer N, Nesher I, Kavarana M, Sun M, Peet R, Meiri D, Madhavan S. De novo assembly and annotation of transcriptomes from two cultivars of Cannabis sativa with different cannabinoid profiles. Gene 2020; 762:145026. [PMID: 32781193 DOI: 10.1016/j.gene.2020.145026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 07/31/2020] [Indexed: 10/23/2022]
Abstract
Cannabis has been cultivated for millennia for medicinal, industrial and recreational uses. Our long-term goal is to compare the transcriptomes of cultivars with different cannabinoid profiles for therapeutic purposes. Here we describe the de novo assembly, annotation and initial analysis of two cultivars of Cannabis, a high THC variety and a CBD plus THC variety. Cultivars were grown under different lighting conditions; flower buds were sampled over 71 days. Cannabinoid profiles were determined by ESI-LC/MS. RNA samples were sequenced using the HiSeq4000 platform. Transcriptomes were assembled using the DRAP pipeline and annotated using the BLAST2GO pipeline and other tools. Each transcriptome contained over twenty thousand protein encoding transcripts with ORFs and flanking sequence. Identification of transcripts for cannabinoid pathway and related enzymes showed full-length ORFs that align with the draft genomes of the Purple Kush and Finola cultivars. Two transcripts were found for olivetolic acid cyclase (OAC) that mapped to distinct locations on the Purple Kush genome suggesting multiple genes for OAC are expressed in some cultivars. The ability to make high quality annotated reference transcriptomes in Cannabis or other plants can promote rapid comparative analysis between cultivars and growth conditions in Cannabis and other organisms without annotated genome assemblies.
Collapse
Affiliation(s)
- Peter McGarvey
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Jiahao Huang
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Matthew McCoy
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Joshua Orvis
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Yael Katsir
- Technion - Israel Institute of Technology, Haifa, Israel
| | | | | | | | - Mingyang Sun
- Teewinot Life Sciences Corporation, Tampa, FL, USA
| | | | - David Meiri
- Technion - Israel Institute of Technology, Haifa, Israel
| | - Subha Madhavan
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
26
|
Prjibelski AD, Puglia GD, Antipov D, Bushmanova E, Giordano D, Mikheenko A, Vitale D, Lapidus A. Extending rnaSPAdes functionality for hybrid transcriptome assembly. BMC Bioinformatics 2020; 21:302. [PMID: 32703149 PMCID: PMC7379828 DOI: 10.1186/s12859-020-03614-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Accepted: 06/18/2020] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. RESULTS In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. CONCLUSION To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.
Collapse
Affiliation(s)
- Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia.
| | - Giuseppe D Puglia
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Elena Bushmanova
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Daniela Giordano
- Department of Electrical, Electronics and Computer Engineering, University of Catania, Catania, Italy
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Domenico Vitale
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo, Catania, Italy
| | - Alla Lapidus
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| |
Collapse
|
27
|
Vechtova P, Fussy Z, Cegan R, Sterba J, Erhart J, Benes V, Grubhoffer L. Catalogue of stage-specific transcripts in Ixodes ricinus and their potential functions during the tick life-cycle. Parasit Vectors 2020; 13:311. [PMID: 32546252 DOI: 10.1186/s13071-020-04173-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 06/05/2020] [Indexed: 12/15/2022] Open
Abstract
Background The castor bean tick Ixodes ricinus is an important vector of several clinically important diseases, whose prevalence increases with accelerating global climate changes. Characterization of a tick life-cycle is thus of great importance. However, researchers mainly focus on specific organs of fed life stages, while early development of this tick species is largely neglected. Methods In an attempt to better understand the life-cycle of this widespread arthropod parasite, we sequenced the transcriptomes of four life stages (egg, larva, nymph and adult female), including unfed and partially blood-fed individuals. To enable a more reliable identification of transcripts and their comparison in all five transcriptome libraries, we validated an improved-fit set of five I. ricinus-specific reference genes for internal standard normalization of our transcriptomes. Then, we mapped biological functions to transcripts identified in different life stages (clusters) to elucidate life stage-specific processes. Finally, we drew conclusions from the functional enrichment of these clusters specifically assigned to each transcriptome, also in the context of recently published transcriptomic studies in ticks. Results We found that reproduction-related transcripts are present in both fed nymphs and fed females, underlining the poorly documented importance of ovaries as moulting regulators in ticks. Additionally, we identified transposase transcripts in tick eggs suggesting elevated transposition during embryogenesis, co-activated with factors driving developmental regulation of gene expression. Our findings also highlight the importance of the regulation of energetic metabolism in tick eggs during embryonic development and glutamate metabolism in nymphs. Conclusions Our study presents novel insights into stage-specific transcriptomes of I. ricinus and extends the current knowledge of this medically important pathogen, especially in the early phases of its development.![]()
Collapse
|
28
|
Arce-Leal ÁP, Bautista R, Rodríguez-Negrete EA, Manzanilla-Ramírez MÁ, Velázquez-Monreal JJ, Méndez-Lozano J, Bejarano ER, Castillo AG, Claros MG, Leyva-López NE. De novo assembly and functional annotation of Citrus aurantifolia transcriptome from Candidatus Liberibacter asiaticus infected and non-infected trees. Data Brief 2020; 29:105198. [PMID: 32071978 PMCID: PMC7011030 DOI: 10.1016/j.dib.2020.105198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 01/07/2020] [Accepted: 01/20/2020] [Indexed: 12/03/2022] Open
Abstract
Mexican lime (Citrus aurantifolia) belongs to the Rutaceae family and nowadays is one of the major commercial citrus crops in different countries. In Mexico, Mexican lime production is impaired by Huanglongbing (HLB) disease associated to Candidatus Liberibacter asiaticus (CLas) bacteria. To date, transcriptomic studies of CLas-Citrus interaction, have been performed mainly in sweet citrus models at symptomatic (early) stage where pleiotropic responses could mask important, pathogen-driven host modulation as well as, host antibacterial responses. Additionally, well-assembled reference transcriptomes for acid limes including C. aurantifolia are not available. The development of improved transcriptomic resources for CLas-citrus pathosystem, including both asymptomatic (early) and symptomatic (late) stages, could accelerate the understanding of the disease. Here, we provide the first transcriptomic analysis from healthy and HLB-infected C. aurantifolia leaves at both asymptomatic and symptomatic stages, using a RNA-seq approach in the Illumina NexSeq500 platform. The construction of the assembled transcriptome was conducted using the predesigned workflow Transflow and a total of 41,522 tentative transcripts (TTs) obtained. These C. aurantifolia TTs were functionally annotated using TAIR10 and UniProtKB databases. All raw reads were deposited in the NCBI SRA with accession numbers SRR10353556, SRR10353558, SRR10353560 and SRR10353562. Overall, this dataset adds new transcriptomic valuable tools for future breeding programs, will allow the design of novel diagnostic molecular markers, and will be an essential tool for studying the HLB disease.
Collapse
Affiliation(s)
- Ángela Paulina Arce-Leal
- Instituto Politécnico Nacional, CIIDIR-Unidad Sinaloa, Departamento de Biotecnología Agrícola, Mexico
| | - Rocío Bautista
- Plataforma Andaluza de Bioinformática, Universidad de Málaga, Malaga, Spain
| | - Edgar A Rodríguez-Negrete
- CONACyT, Instituto Politécnico Nacional, CIIDIR-Unidad Sinaloa, Departamento de Biotecnología Agrícola, Mexico
| | | | | | - Jesús Méndez-Lozano
- Instituto Politécnico Nacional, CIIDIR-Unidad Sinaloa, Departamento de Biotecnología Agrícola, Mexico
| | - Eduardo R Bejarano
- Instituto de Hortofruticultura Subtropical y Mediterránea La Mayora (IHSM-UMA-CSIC), Área de Genética, Facultad de Ciencias, Universidad de Málaga, Málaga, Spain
| | - Araceli G Castillo
- Instituto de Hortofruticultura Subtropical y Mediterránea La Mayora (IHSM-UMA-CSIC), Área de Genética, Facultad de Ciencias, Universidad de Málaga, Málaga, Spain
| | - M Gonzalo Claros
- Plataforma Andaluza de Bioinformática, Universidad de Málaga, Malaga, Spain.,Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, Malaga, Spain
| | - Norma Elena Leyva-López
- Instituto Politécnico Nacional, CIIDIR-Unidad Sinaloa, Departamento de Biotecnología Agrícola, Mexico
| |
Collapse
|
29
|
Arvind K, Rajesh MK, Josephrajkumar A, Grace T. Dataset of de novo assembly and functional annotation of the transcriptome of certain developmental stages of coconut rhinoceros beetle, Oryctes rhinoceros L. Data Brief 2020; 28:105036. [PMID: 31921949 DOI: 10.1016/j.dib.2019.105036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 12/12/2019] [Indexed: 11/20/2022] Open
Abstract
The coconut rhinoceros beetle, Oryctes rhinoceros L. (Insecta: Coleoptera: Scarabaeidae: Dynastinae) is one of the world's most important endemic and incessant pests of coconut (particularly in India and Southeast Asia), causing an estimated 10% yield loss in the crop. Various management strategies formulated and implemented to control this pest include bioagents, insecticide sprays, liquid formulations, pheromone traps, and botanical formulations. Also, potential microbial bioagents viz., Oryctes rhinoceros nudivirus (OrNV) and Metarhizium anisopliae have been implemented as biological control agents and this has led to a beneficial reduction of the pest population unless significant immigration occurs. To date, research and development activities are still on-going for the successful management of the pest; yet advances in understanding at the molecular level have been limited because basic genomic information is lacking for this cosmopolitan pest. Transcriptome approach has been proved extremely useful in finding potential genes for pest control. Transcriptome analysis aids in gaining insights into the transcriptional changes which occur during different developmental stages of an organism. We have performed RNA sequencing of certain different developmental stages of O. rhinoceros viz., early instar larva, late instar larva, pupa, and adult, in an Illumina HiSeq™ 2500 platform. Due to the unavailability of O. rhinoceros genome, the RNA-seq data generated were assembled de novo using Trinity and annotated following redundancy removal. A dataset of 87,451 transcripts, which resulted after redundancy removal, were annotated using the NCBI non-redundant (nr) protein and Uniprot databases. The data furnished could be used by others working in the development of pest management strategies, especially the identification of molecular targets for effective pest control. This information allows a better understanding of O. rhinoceros biology which would contribute to outlining a new generation of stage-specific, environmentally friendly pest management techniques.
Collapse
|
30
|
Abstract
In the last few years, long non-coding RNAs (lncRNAs) have been widely studied in humans, and their relevance for physiological and pathological conditions has been demonstrated. In parasites, there are only a few works, such as in Plasmodium falciparum, where it was shown that an lncRNA regulates the expression of a gene associated with immune system evasion, also indicating the relevance of understanding the role of this class of RNAs in parasites. In Schistosoma mansoni, in the last 2 years, there were four published articles related to the annotation of lncRNAs in different life cycle stages using RNA-Seq libraries. In order to make this process of lncRNA identification and annotation more accessible to biologists with no bioinformatics training, considering the growing number of S. mansoni RNA-Seq libraries publicly available from different sources, such as ovary tissues from bi-sex and single-sex infections, and the potential of lncRNAs as therapeutic targets, we provide this step-by-step protocol of lncRNA identification and quantification. This guide includes the download of RNA-Seq libraries from a public database and reads processing and mapping against the genome, transcript reconstruction, novel lncRNA identification, transcripts expression level determination, and the identification of differentially expressed lncRNAs.
Collapse
|
31
|
Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 2019; 20:278. [PMID: 31842956 PMCID: PMC6912988 DOI: 10.1186/s13059-019-1910-1] [Citation(s) in RCA: 674] [Impact Index Per Article: 134.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/02/2019] [Indexed: 11/13/2022] Open
Abstract
RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Aleksey V. Zimin
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Geo M. Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Roham Razaghi
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218 USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Mihaela Pertea
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
32
|
Zhang M, Heikkinen L, Knott KE, Wong G. De novo transcriptome assembly of a facultative parasitic nematode Pelodera (syn. Rhabditis) strongyloides. Gene 2019; 710:30-38. [PMID: 31128222 DOI: 10.1016/j.gene.2019.05.041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 04/01/2019] [Accepted: 05/21/2019] [Indexed: 01/06/2023]
Abstract
Pelodera strongyloides is a generally free-living gonochoristic facultative nematode. The whole genomic sequence of P. strongyloides remains unknown but 4 small subunit ribosomal RNA (ssrRNA) gene sequences are available. This project launched a de novo transcriptome assembly with 100 bp paired-end RNA-seq reads from normal, starved and wet-plate cultured animals. Trinity assembly tool generated 104,634 transcript contigs with N50 contig being 2195 bp and average contig length at 1103 bp. Transcriptome BLASTX matching results of five nematodes (C. elegans, Strongyloides stercoralis, Necator americanus, Trichuris trichiura, and Pristionchus pacificus) were consistent with their evolutionary relationships. Sixteen genes were identified to be homologous to key elements of the C. elegans RNA interference system, such as Dicer, Argonaute, RNA-dependent RNA polymerase and double strand RNA transport proteins. In starved samples, we observed up-regulation of cuticle related genes and 3 dauer formation genes. Dauer morphology was captured with enlarged phasmid under light microscopy, and dauer and normal larvae counts in clumps had a Pearson's product-moment correlation of 0.805 with P-value = 0.0088. Our results demonstrate that P. strongyloides could be used for studying nematode-related human or pet parasitic diseases. The sequenced assembled transcriptome reported here may be useful to understand the evolution of parasitism in Nematoda.
Collapse
Affiliation(s)
- Menglei Zhang
- Centre of Reproduction, Development and Aging, Faculty of Health Sciences, University of Macau, Macau, SAR, China
| | | | | | - Garry Wong
- Centre of Reproduction, Development and Aging, Faculty of Health Sciences, University of Macau, Macau, SAR, China.
| |
Collapse
|
33
|
Qi X, Ogden EL, Ehlenfeldt MK, Rowland LJ. Dataset of de novo assembly and functional annotation of the transcriptome of blueberry ( Vaccinium spp.). Data Brief 2019; 25:104390. [PMID: 31497632 DOI: 10.1016/j.dib.2019.104390] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 07/25/2019] [Accepted: 08/05/2019] [Indexed: 11/22/2022] Open
Abstract
Blueberry is an economically important berry crop. Both production and consumption of blueberries have increased sharply worldwide in recent years at least partly due to their known health benefits. The development of improved genomic resources for blueberry, such as a well-assembled genome and transcriptome, could accelerate breeding through genomic-assisted approaches. To enrich available transcriptome data and identify genes potentially involved in fruit quality, RNA sequencing was performed on fruit tissue from two northern-adapted hybrid blueberry breeding populations. RNA-seq was carried out using the Illumina HiSeqTM 2500 platform. Because of the absence of a reference-grade genome for blueberry, a transcriptome was de novo assembled from this RNA-seq data and other publicly available transcriptome data from blueberry downloaded from the National Center for Biotechnology Information (NCBI) Short Read Archive (SRA) using Trinity. After removing redundancy, this resulted in a dataset of 91,861 blueberry unigenes. This unigene dataset was functionally annotated using the NCBI-Nr protein database. All raw reads from the breeding populations were deposited in the NCBI SRA with accession numbers SRR6281886, SRR6281887, SRR6281888, and SRR6281889. The de novo transcriptome assembly was deposited at NCBI Transcriptome Shotgun Assembly (TSA) database with accession number GGAB00000000. These data will provide real expression evidence for the blueberry genome gene prediction and gene functional annotation and a reference transcriptome for future gene expression studies involving blueberry fruit.
Collapse
|
34
|
Voshall A, Moriyama EN. Next-generation transcriptome assembly and analysis: Impact of ploidy. Methods 2019; 176:14-24. [PMID: 31176772 DOI: 10.1016/j.ymeth.2019.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 05/30/2019] [Accepted: 06/01/2019] [Indexed: 10/26/2022] Open
Abstract
Whole genome duplications (WGD) occur widely in plants, but the effects of these events impact all branches of life. WGD events have major evolutionary impacts, often leading to major structural changes within the chromosomes and massive changes in gene expression that facilitate rapid speciation and gene diversification. Even for species that currently have diploid genomes, the impact of ancestral duplication events is still present in the genomes, especially in the context of highly similar gene families that are retained from WGD. However, the impact of these ploidies on various bioinformatics workflows has not been studied well. In this review, we overview biological significance of polyploidy in different organisms. We describe the impact of having polyploid transcriptomes on bioinformatics analyses, especially focusing on transcriptome assembly and transcript quantification. We discuss the benefits of using simulated benchmarking data when we examine the performance of various methods. We also present an example strategy to generate simulated allopolyploid transcriptomes and RNAseq datasets and how these benchmark datasets can be used to assess the performance of transcript assembly and quantification methods. Our benchmarking study shows that all transcriptome assembly methods are affected by having polyploid genomes. Quantification accuracy is also impacted by polyploidy depending on the method. These simulated datasets can be adapted for testing, such as, read mapping, variant calling, and differential expression using biologically realistic conditions.
Collapse
Affiliation(s)
- Adam Voshall
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
| |
Collapse
|
35
|
Abstract
We present TransLiG, a new de novo transcriptome assembler, which is able to integrate the sequence depth and pair-end information into the assembling procedure by phasing paths and iteratively constructing line graphs starting from splicing graphs. TransLiG is shown to be significantly superior to all the salient de novo assemblers in both accuracy and computing resources when tested on artificial and real RNA-seq data. TransLiG is freely available at https://sourceforge.net/projects/transcriptomeassembly/files/ .
Collapse
Affiliation(s)
- Juntao Liu
- School of Mathematics, Shandong University, Jinan, 250100 China
| | - Ting Yu
- School of Mathematics, Shandong University, Jinan, 250100 China
| | - Zengchao Mu
- School of Mathematics, Shandong University, Jinan, 250100 China
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, 250100 China
| |
Collapse
|
36
|
Abstract
The pig is a well-studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI reference sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from current NCBI and Ensembl reference pig gene sets, additional alternate transcripts, and other improvements. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.
Collapse
|
37
|
Vendrami DLJ, Forcada J, Hoffman JI. Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal. BMC Genomics 2019; 20:72. [PMID: 30669975 DOI: 10.1186/s12864-019-5440-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 01/08/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. RESULTS PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. CONCLUSIONS Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism's genome.
Collapse
|
38
|
Wang Y, Yang H, Zi C, Wang Z. Transcriptomic analysis of the red and green light responses in Columba livia domestica. 3 Biotech 2019; 9:20. [PMID: 30622858 DOI: 10.1007/s13205-018-1551-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 12/20/2018] [Indexed: 11/29/2022] Open
Abstract
In this study, 108 paired White King pigeons, randomly divided into three compartments were exposed to green light, red light, and white light followed by 15 h of light exposure, for a 6-month period. Three female birds from each group were selected and ovarian stromal tissue was collected. Pigeon reproductive data were also recorded every day. We performed transcriptome assembly on several tissue samples using Illumina Hiseq 2000 and analyzed differentially expressed genes involving follicle development mechanisms. Reproductive data confirmed that exposure to red and green lights improved pigeon reproduction. In total, approximately 158,080 unigenes with an average length of 753 bp were obtained using the Trinity program. Gene ontology, clusters of orthologous groups, and the Kyoto encyclopedia of genes were used to annotate and classify these unigenes. Large numbers of differentially expressed genes were discovered through pairwise comparisons between groups treated with monochromatic light versus white light. Some of these genes are associated with steroid hormone biosynthesis, cell cycle and circadian rhythm. Furthermore, qRT-PCR was used to detect the relative expression levels of randomly selected genes. A total of 17,419 potential simple sequence repeats were also identified. Our study provides insights into potential molecular mechanisms and genes that regulate pigeon reproduction in response to monochromatic light exposure. Our results and data will facilitate a further investigation into the molecular mechanisms behind the effects of red and green lights on follicle development and reproduction in the pigeon.
Collapse
Affiliation(s)
- Ying Wang
- 1College of Animal Science and Technology, Yangzhou University, Yangzhou, 225009 Jiangsu Province China
| | - Haiming Yang
- 1College of Animal Science and Technology, Yangzhou University, Yangzhou, 225009 Jiangsu Province China
| | - Chen Zi
- 2Department of Pathology, Linyi People's Hospital, Linyi, 276000 Shandong Province China
| | - Zhiyue Wang
- 1College of Animal Science and Technology, Yangzhou University, Yangzhou, 225009 Jiangsu Province China
| |
Collapse
|
39
|
Pertea M, Shumate A, Pertea G, Varabyou A, Breitwieser FP, Chang YC, Madugundu AK, Pandey A, Salzberg SL. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol 2018; 19:208. [PMID: 30486838 PMCID: PMC6260756 DOI: 10.1186/s13059-018-1590-2] [Citation(s) in RCA: 162] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 11/16/2018] [Indexed: 01/06/2023] Open
Abstract
We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Alaina Shumate
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Geo Pertea
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ales Varabyou
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Florian P Breitwieser
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Yu-Chi Chang
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Anil K Madugundu
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Institute of Bioinformatics, International Technology Park, Bangalore, India
- Manipal Academy of Higher Education (MAHE), Manipal, Karnataka, India
- Present address: Center for Individualized Medicine and Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Departments of Biological Chemistry, Pathology, Neurology, and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Present address: Center for Individualized Medicine and Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
40
|
Chiu R, Nip KM, Chu J, Birol I. TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics 2018; 11:79. [PMID: 30200994 PMCID: PMC6131862 DOI: 10.1186/s12920-018-0402-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 08/31/2018] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data. RESULTS Here we present our Targeted Assembly Pipeline (TAP), which consists of four stages: 1) alignment-free gene-level classification of RNA-seq reads using BioBloomTools, 2) de novo assembly of individual targets using Trans-ABySS, 3) alignment of assembled contigs to the reference genome and transcriptome with GMAP and BWA and 4) structural and splicing variant detection using PAVFinder. We show that PAVFinder is a robust gene fusion detection tool when compared to established methods such as Tophat-Fusion and deFuse on simulated data of 448 events. Using the Leucegene acute myeloid leukemia (AML) RNA-seq data and a set of 580 COSMIC target genes, TAP identified a wide range of hallmark molecular anomalies including gene fusions, tandem duplications, insertions and deletions in agreement with published literature results. Moreover, also in this dataset, TAP captured AML-specific splicing variants such as skipped exons and novel splice sites reported in studies elsewhere. Running time of TAP on 100-150 million read pairs and a 580-gene set is one to 2 hours on a 48-core machine. CONCLUSIONS We demonstrated that TAP is a fast and robust RNA-seq variant detection pipeline that is potentially amenable to clinical applications. TAP is available at http://www.bcgsc.ca/platform/bioinfo/software/pavfinder.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 100-570 West 7th Ave, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 100-570 West 7th Ave, Vancouver, BC, V5Z 4S6, Canada
| | - Justin Chu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 100-570 West 7th Ave, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 100-570 West 7th Ave, Vancouver, BC, V5Z 4S6, Canada. .,Department of Medical Genetics, The University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
41
|
Visser EA, Wegrzyn JL, Myburg AA, Naidoo S. Defence transcriptome assembly and pathogenesis related gene family analysis in Pinus tecunumanii (low elevation). BMC Genomics 2018; 19:632. [PMID: 30139335 PMCID: PMC6108113 DOI: 10.1186/s12864-018-5015-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 08/14/2018] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Fusarium circinatum is a pressing threat to the cultivation of many economically important pine tree species. Efforts to develop effective disease management strategies can be aided by investigating the molecular mechanisms involved in the host-pathogen interaction between F. circinatum and pine species. Pinus tecunumanii and Pinus patula are two closely related tropical pine species that differ widely in their resistance to F. circinatum challenge, being resistant and susceptible respectively, providing the potential for a useful pathosystem to investigate the molecular responses underlying resistance to F. circinatum. However, no genomic resources are available for P. tecunumanii. Pathogenesis-related proteins are classes of proteins that play important roles in plant-microbe interactions, e.g. chitinases; proteins that break down the major structural component of fungal cell walls. Generating a reference sequence for P. tecunumanii and characterizing pathogenesis related gene families in these two pine species is an important step towards unravelling the pine-F. circinatum interaction. RESULTS Eight reference based and 12 de novo assembled transcriptomes were produced, for juvenile shoot tissue from both species. EvidentialGene pipeline redundancy reduction, expression filtering, protein clustering and taxonomic filtering produced a 50 Mb shoot transcriptome consisting of 28,621 contigs for P. tecunumanii and a 72 Mb shoot transcriptome consisting of 52,735 contigs for P. patula. Predicted protein sequences encoded by the assembled transcriptomes were clustered with reference proteomes from 92 other species to identify pathogenesis related gene families in P. patula, P. tecunumanii and other pine species. CONCLUSIONS The P. tecunumanii transcriptome is the first gene catalogue for the species, representing an important resource for studying resistance to the pitch canker pathogen, F. circinatum. This study also constitutes, to our knowledge, the largest index of gymnosperm PR-genes to date.
Collapse
Affiliation(s)
- Erik A. Visser
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private bag X20, Pretoria, 0028 South Africa
| | - Jill L. Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269 USA
| | - Alexander A. Myburg
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private bag X20, Pretoria, 0028 South Africa
| | - Sanushka Naidoo
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private bag X20, Pretoria, 0028 South Africa
| |
Collapse
|
42
|
Hacking J, Bertozzi T, Moussalli A, Bradford T, Gardner M. Characterisation of major histocompatibility complex class I transcripts in an Australian dragon lizard. Dev Comp Immunol 2018; 84:164-171. [PMID: 29454831 DOI: 10.1016/j.dci.2018.02.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 02/10/2018] [Accepted: 02/10/2018] [Indexed: 06/08/2023]
Abstract
Characterisation of squamate major histocompatibility complex (MHC) genes has lagged behind other taxonomic groups. MHC genes encode cell-surface glycoproteins that present self- and pathogen-derived peptides to T cells and play a critical role in pathogen recognition. Here we characterise MHC class I transcripts for an agamid lizard (Ctenophorus decresii) and investigate the evolution of MHC class I in Iguanian lizards. An iterative assembly strategy was used to identify six full-length C. decresii MHC class I transcripts, which were validated as likely to encode classical class I MHC molecules. Evidence for exon shuffling recombination was uncovered for C. decresii transcripts and Bayesian phylogenetic analysis of Iguanian MHC class I sequences revealed a pattern expected under a birth-and-death mode of evolution. This work provides a stepping stone towards further research on the agamid MHC class I region.
Collapse
Affiliation(s)
- Jessica Hacking
- College of Science and Engineering, Flinders University, Bedford Park, SA, 5042, Australia.
| | - Terry Bertozzi
- Evolutionary Biology Unit, South Australian Museum, Adelaide, SA, 5000, Australia; School of Biological Sciences, University of Adelaide, Adelaide, SA, 5005, Australia.
| | - Adnan Moussalli
- Sciences Department, Museum Victoria, Carlton Gardens, VIC, 3053, Australia.
| | - Tessa Bradford
- College of Science and Engineering, Flinders University, Bedford Park, SA, 5042, Australia; Evolutionary Biology Unit, South Australian Museum, Adelaide, SA, 5000, Australia; School of Biological Sciences, University of Adelaide, Adelaide, SA, 5005, Australia.
| | - Michael Gardner
- College of Science and Engineering, Flinders University, Bedford Park, SA, 5042, Australia; Evolutionary Biology Unit, South Australian Museum, Adelaide, SA, 5000, Australia.
| |
Collapse
|
43
|
Lv X, Jin Y, Wang Y. De novo transcriptome assembly and identification of salt-responsive genes in sugar beet M14. Comput Biol Chem 2018; 75:1-10. [PMID: 29705503 DOI: 10.1016/j.compbiolchem.2018.04.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 01/06/2018] [Accepted: 04/21/2018] [Indexed: 11/21/2022]
Abstract
Sugar beet (Beta vulgaris) is an important crop of sugar production in the world. Previous studies reported that sugar beet monosomic addition line M14 obtained from the intercross between Beta vulgaris L. (cultivated species) and B. corolliflora Zoss (wild species) exhibited tolerance to salt (up to 0.5 M NaCl) stress. To estimate a broad spectrum of genes involved in the M14 salt tolerance will help elucidate the molecular mechanisms underlying salt stress. Comparative transcriptomics was performed to monitor genes differentially expressed in the leaf and root samples of the sugar beet M14 seedlings treated with 0, 200 and 400 mM NaCl, respectively. Digital gene expression revealed that 3856 unigenes in leaves and 7157 unigenes in roots were differentially expressed under salt stress. Enrichment analysis of the differentially expressed genes based on GO and KEGG databases showed that in both leaves and roots genes related to regulation of redox balance, signal transduction, and protein phosphorylation were differentially expressed. Comparison of gene expression in the leaf and root samples treated with 200 and 400 mM NaCl revealed different mechanisms for coping with salt stress. In addition, the expression levels of nine unigenes in the reactive oxygen species (ROS) scavenging system exhibited significant differences in the leaves and roots. Our transcriptomics results have provided new insights into the salt-stress responses in the leaves and roots of sugar beet.
Collapse
|
44
|
Abstract
Thyroid hormones are pleiotropic hormones involved in chordates physiology. Understanding their functions and mechanisms is also instrumental to diagnose dys-regulations and get a predictive power that can applied to medicine, ecology, etc. Today, high-throughput sequencing technologies offer the opportunity to address this issue not only in model organisms but also in non-model organisms. Here, we describe a method that makes use of RNA-seq to address differential expression analysis in non-model organism.
Collapse
Affiliation(s)
- Nicolas Buisine
- Function and Mechanism of Action of Thyroid Hormone Receptor group, UMR 7221 CNRS and Muséum National d'Histoire Naturelle, Sorbonne Universités, Paris, France
| | - Gwenneg Kerdivel
- Function and Mechanism of Action of Thyroid Hormone Receptor group, UMR 7221 CNRS and Muséum National d'Histoire Naturelle, Sorbonne Universités, Paris, France
| | - Laurent M Sachs
- Function and Mechanism of Action of Thyroid Hormone Receptor group, UMR 7221 CNRS and Muséum National d'Histoire Naturelle, Sorbonne Universités, Paris, France.
| |
Collapse
|
45
|
Abstract
Proper control of microRNA (miRNA) expression is critical for normal development and physiology, while abnormal miRNA expression is a common feature of many diseases. Dissecting mechanisms of miRNA regulation, however, is complicated by the generally poor annotation of miRNA primary transcripts (pri-miRNAs). Although some miRNAs are processed from well-defined protein coding genes, the majority of pri-miRNAs are poorly characterized noncoding RNAs, with incomplete annotation of promoters, splice sites, and polyadenylation signals. Due to the efficiency of DROSHA processing, the abundance of pri-miRNAs is very low at steady state, thereby complicating the elucidation of pri-miRNA structures. Here we describe a strategy to enrich intact pri-miRNAs and improve their coverage in RNA sequencing (RNA-seq) experiments. In addition, we outline a computational approach for reconstruction of pri-miRNA structures. This pipeline begins with raw RNA-seq reads and concludes with publication-ready visualization of pri-miRNA annotations. Together, these approaches allow the user to define and explore miRNA gene structures in a cell-type or organism of interest.
Collapse
Affiliation(s)
- Tsung-Cheng Chang
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Joshua T Mendell
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
46
|
Kollmar M, Simm D. Identifying Sequenced Eukaryotic Genomes and Transcriptomes with diArk. Methods Mol Biol 2018; 1757:1-19. [PMID: 29761453 DOI: 10.1007/978-1-4939-7737-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The diArk Eukaryotic Genome Database is a manually curated and updated repository of available eukaryotic genome and transcriptome assemblies. diArk is a key resource for researchers interested in comparative eukaryotic genomics, and the entry point to browsing sequenced eukaryotes in general and to find the most closely related species to the own organism of interest in particular. The exponentially increasing number of sequenced species demands sophisticated search and data presentation tools. In this chapter we describe how to navigate the diArk database keeping a first-time user in mind.
Collapse
Affiliation(s)
- Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany.
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany
- Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University, Göttingen, Germany
| |
Collapse
|
47
|
Stavrianakou M, Perez R, Wu C, Sachs MS, Aramayo R, Harlow M. Draft de novo transcriptome assembly and proteome characterization of the electric lobe of Tetronarce californica: a molecular tool for the study of cholinergic neurotransmission in the electric organ. BMC Genomics 2017; 18:611. [PMID: 28806931 PMCID: PMC5557070 DOI: 10.1186/s12864-017-3890-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 06/21/2017] [Indexed: 11/10/2022] Open
Abstract
Background The electric organ of Tetronarce californica (an electric ray formerly known as Torpedo californica) is a classic preparation for biochemical studies of cholinergic neurotransmission. To broaden the usefulness of this preparation, we have performed a transcriptome assembly of the presynaptic component of the electric organ (the electric lobe). We combined our assembled transcriptome with a previous transcriptome of the postsynaptic electric organ, to define a MetaProteome containing pre- and post-synaptic components of the electric organ. Results Sequencing yielded 102 million paired-end 100 bp reads. De novo Trinity assembly was performed at Kmer 25 (default) and Kmers 27, 29, and 31. Trinity, generated around 103,000 transcripts, and 78,000 genes per assembly. Assemblies were evaluated based on the number of bases/transcripts assembled, RSEM-EVAL scores and informational content and completeness. We found that different assemblies scored differently according to the evaluation criteria used, and that while each individual assembly contained unique information, much of the assembly information was shared by all assemblies. To generate the presynaptic transcriptome (electric lobe), while capturing all information, assemblies were first clustered and then combined with postsynaptic transcripts (electric organ) downloaded from NCBI. The completness of the resulting clustered predicted MetaProteome was rigorously evaluated by comparing its information against the predicted proteomes from Homo sapiens, Callorhinchus milli, and the Transporter Classification Database (TCDB). Conclusions In summary, we obtained a MetaProteome containing 92%, 88.5%, and 66% of the expected set of ultra-conserved sequences (i.e., BUSCOs), expected to be found for Eukaryotes, Metazoa, and Vertebrata, respectively. We cross-annotated the conserved set of proteins shared between the T. californica MetaProteome and the proteomes of H. sapiens and C. milli, using the H. sapiens genome as a reference. This information was used to predict the position in human pathways of the conserved members of the T. californica MetaProteome. We found proteins not detected before in T. californica, corresponding to processes involved in synaptic vesicle biology. Finally, we identified 42 transporter proteins in TCDB that were detected by the T. californica MetaProteome (electric fish) and not selected by a control proteome consisting of the combined proteomes of 12 widely diverse non-electric fishes by Reverse-Blast-Hit Blast. Combined, the information provided here is not only a unique tool for the study of cholinergic neurotransmission, but it is also a starting point for understanding the evolution of early vertebrates. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3890-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maria Stavrianakou
- Department of Biology, Texas A&M University, 3258 TAMU, College Station, 77843-3258, USA
| | - Ricardo Perez
- Department of Biology, Texas A&M University, 3258 TAMU, College Station, 77843-3258, USA
| | - Cheng Wu
- Department of Biology, Texas A&M University, 3258 TAMU, College Station, 77843-3258, USA
| | - Matthew S Sachs
- Department of Biology, Texas A&M University, 3258 TAMU, College Station, 77843-3258, USA
| | - Rodolfo Aramayo
- Department of Biology, Texas A&M University, 3258 TAMU, College Station, 77843-3258, USA.
| | - Mark Harlow
- Department of Biology, Texas A&M University, 3258 TAMU, College Station, 77843-3258, USA.
| |
Collapse
|
48
|
Abstract
Background With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. Results We develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. Conclusions Our strategy minimizes memory consumption while simultaneously obtaining comparable or improved accuracy over existing algorithms. It provides support for incremental updates of assemblies when new libraries become available.
Collapse
Affiliation(s)
- Sing-Hoi Sze
- Department of Computer Science and Engineering, Texas A&M University, College Station, 77843, TX, USA. .,Department of Biochemistry & Biophysics, Texas A&M University, College Station, 77843, TX, USA.
| | - Meaghan L Pimsler
- Department of Entomology, Texas A&M University, College Station, 77843, TX, USA
| | - Jeffery K Tomberlin
- Department of Entomology, Texas A&M University, College Station, 77843, TX, USA
| | - Corbin D Jones
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Aaron M Tarone
- Department of Entomology, Texas A&M University, College Station, 77843, TX, USA
| |
Collapse
|
49
|
Hoang NV, Furtado A, Mason PJ, Marquardt A, Kasirajan L, Thirugnanasambandam PP, Botha FC, Henry RJ. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics 2017; 18:395. [PMID: 28532419 PMCID: PMC5440902 DOI: 10.1186/s12864-017-3757-8] [Citation(s) in RCA: 131] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 05/03/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms. RESULTS The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes. CONCLUSIONS The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.
Collapse
Affiliation(s)
- Nam V Hoang
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,College of Agriculture and Forestry, Hue University, Hue, Vietnam
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia
| | - Patrick J Mason
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia
| | - Annelie Marquardt
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,Sugar Research Australia, Indooroopilly, QLD, 4068, Australia
| | - Lakshmi Kasirajan
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,ICAR - Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
| | - Prathima P Thirugnanasambandam
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,ICAR - Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
| | - Frederik C Botha
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,Sugar Research Australia, Indooroopilly, QLD, 4068, Australia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.
| |
Collapse
|
50
|
Cribbin KM, Quackenbush CR, Taylor K, Arias-Rodriguez L, Kelley JL. Sex-specific differences in transcriptome profiles of brain and muscle tissue of the tropical gar. BMC Genomics 2017; 18:283. [PMID: 28388875 PMCID: PMC5383948 DOI: 10.1186/s12864-017-3652-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 03/22/2017] [Indexed: 02/06/2023] Open
Abstract
Background The tropical gar (Atractosteus tropicus) is the southernmost species of the seven extant species of gar fishes in the world. In Mexico and Central America, the species is an important food source due to its nutritional quality and low price. Despite its regional importance and increasing concerns about overexploitation and habitat degradation, basic genetic information on the tropical gar is lacking. Determining genetic information on the tropical gar is important for the sustainable management of wild populations, implementation of best practices in aquaculture settings, evolutionary studies of ancient lineages, and an understanding of sex-specific gene expression. In this study, the transcriptome of the tropical gar was sequenced and assembled de novo using tissues from three males and three females using Illumina sequencing technology. Sex-specific and highly differentially expressed transcripts in brain and muscle tissues between adult males and females were subsequently identified. Results The transcriptome was assembled de novo resulting in 80,611 transcripts with a contig N50 of 3,355 base pairs and over 168 kilobases in total length. Male muscle, brain, and gonad as well as female muscle and brain were included in the assembly. The assembled transcriptome was annotated to identify the putative function of expressed transcripts using Trinotate and SwissProt, a database of well-annotated proteins. The brain and muscle datasets were then aligned to the assembled transcriptome to identify transcripts that were differentially expressed between males and females. The contrast between male and female brain identified 109 transcripts from 106 genes that were significantly differentially expressed. In the muscle comparison, 82 transcripts from 80 genes were identified with evidence for significant differential expression. Almost all genes identified as differentially expressed were sex-specific. The differentially expressed transcripts were enriched for genes involved in cellular functioning, signaling, immune response, and tissue-specific functions. Conclusions This study identified differentially expressed transcripts between male and female gar in muscle and brain tissue. The majority of differentially expressed transcripts had sex-specific expression. Expanding on these findings to other developmental stages, populations, and species may lead to the identification of genetic factors contributing to the skewed sex ratio seen in the tropical gar and of sex-specific differences in expression in other species. Finally, the transcriptome assembly will open future research avenues on tropical gar development, cell function, environmental resistance, and evolution in the context of other early vertebrates. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3652-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kayla M Cribbin
- School of Biological Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Corey R Quackenbush
- School of Biological Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Kyle Taylor
- School of Biological Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Lenin Arias-Rodriguez
- División Académica de Ciencias Biológicas, Universidad Juárez Autónoma de Tabasco (UJAT), C.P. 86150, Villahermosa, Tabasco, Mexico
| | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, WA, 99164, USA.
| |
Collapse
|