1
|
Kang JN, Hur M, Kim CK, Yang SH, Lee SM. Enhancing transcriptome analysis in medicinal plants: multiple unigene sets in Astragalus membranaceus. FRONTIERS IN PLANT SCIENCE 2024; 15:1301526. [PMID: 38384760 PMCID: PMC10879423 DOI: 10.3389/fpls.2024.1301526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024]
Abstract
Astragalus membranaceus is a medicinal plant mainly used in East Asia and contains abundant secondary metabolites. Despite the importance of this plant, the available genomic and genetic information is still limited. De novo transcriptome construction is recognized as an essential method for transcriptome research when reference genome information is incomplete. In this study, we constructed three individual transcriptome sets (unigene sets) for detailed analysis of the phenylpropanoid biosynthesis pathway, a major metabolite of A. membranaceus. Set-1 was a circular consensus sequence (CCS) generated using PacBio sequencing (PacBio-seq). Set-2 consisted of hybridized assembled unigenes with Illumina sequencing (Illumina-seq) reads and PacBio CCS using rnaSPAdes. Set-3 unigenes were assembled from Illumina-seq reads using the Trinity software. Construction of multiple unigene sets provides several advantages for transcriptome analysis. First, it provides an appropriate expression filtering threshold for assembly-based unigenes: a threshold transcripts per million (TPM) ≥ 5 removed more than 88% of assembly-based unigenes, which were mostly short and low-expressing unigenes. Second, assembly-based unigenes compensated for the incomplete length of PacBio CCSs: the ends of the 5`/3` untranslated regions of phenylpropanoid-related unigenes derived from set-1 were incomplete, which suggests that PacBio CCSs are unlikely to be full-length transcripts. Third, more isoform unigenes could be obtained from multiple unigene sets; isoform unigenes missing in Set-1 were detected in set-2 and set-3. Finally, gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses showed that phenylpropanoid biosynthesis and carbohydrate metabolism were highly activated in A. membranaceus roots. Various sequencing technologies and assemblers have been developed for de novo transcriptome analysis. However, no technique is perfect for de novo transcriptome analysis, suggesting the need to construct multiple unigene sets. This method enables efficient transcript filtering and detection of longer and more diverse transcripts.
Collapse
Affiliation(s)
- Ji-Nam Kang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Mok Hur
- Department of Herbal Crop Resources, National Institute of Horticultural & Herbal Science, Eumseong-gun, Chungcheongbuk-do, Republic of Korea
| | - Chang-Kug Kim
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - So-Hee Yang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Si-Myung Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| |
Collapse
|
2
|
Zhang W, Yang Y, Hua S, Ruan Q, Li D, Wang L, Wang X, Wen X, Liu X, Meng Z. Chromosome-level genome assembly and annotation of the yellow grouper, Epinephelus awoara. Sci Data 2024; 11:151. [PMID: 38296995 PMCID: PMC10830450 DOI: 10.1038/s41597-024-02989-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 01/18/2024] [Indexed: 02/02/2024] Open
Abstract
Epinephelus awoara, as known as yellow grouper, is a significant economic marine fish that has been bred artificially in China. However, the genetic structure and evolutionary history of yellow grouper remains largely unknown. Here, this work presents the high-quality chromosome-level genome assembly of yellow grouper using PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. The 984.48 Mb chromosome-level genome of yellow grouper was assembled, with a contig N50 length of 39.77 Mb and scaffold N50 length of 41.39 Mb. Approximately 99.76% of assembled sequences were anchored into 24 pseudo-chromosomes with the assistance of Hi-C reads. Furthermore, approximately 41.17% of the genome was composed of repetitive elements. In total, 24,541 protein-coding genes were predicted, of which 22,509 (91.72%) genes were functionally annotated. The highly accurate, chromosome-level reference genome assembly and annotation are crucial to the understanding of population genetic structure, adaptive evolution and speciation of the yellow grouper.
Collapse
Affiliation(s)
- Weiwei Zhang
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Yang Yang
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
- Key Laboratory of Tropical Marine Fish Germplasm Innovation and Utilization, Ministry of Agriculture and Rural Affairs, Sanya, 570000, China
- Hainan Engineering Research Center for Germplasm Innovation and Utilization, Sanya, 570000, China
| | - Sijie Hua
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Qingxin Ruan
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Duo Li
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Le Wang
- Molecular Population Genetics Group, Temasek Life Sciences Laboratory, National University of Singapore, Singapore City, 119077, Singapore
| | - Xi Wang
- Area of Ecology and Biodiversity, School of Biological Sciences, University of Hong Kong, Hong Kong SAR, 999077, China
| | - Xin Wen
- School of Marine Biology and Fisheries, Hainan Aquaculture Breeding Engineering Research Center, Hainan Academician Team Innovation Center, Hainan University, Haikou, 570228, China
| | - Xiaochun Liu
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
- Southern Laboratory of Ocean Science and Engineering (Zhuhai), Zhuhai, 519000, China
| | - Zining Meng
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Province Key Laboratory of Aquatic Economic Animals, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
- Southern Laboratory of Ocean Science and Engineering (Zhuhai), Zhuhai, 519000, China.
| |
Collapse
|
3
|
Cai L, Liu D, Yang F, Zhang R, Yun Q, Dao Z, Ma Y, Sun W. The chromosome-scale genome of Magnolia sinica (Magnoliaceae) provides insights into the conservation of plant species with extremely small populations (PSESP). Gigascience 2024; 13:giad110. [PMID: 38206588 PMCID: PMC10999834 DOI: 10.1093/gigascience/giad110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/28/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open
Abstract
Magnolia sinica (Magnoliaceae) is a highly threatened tree endemic to southeast Yunnan, China. In this study, we generated for the first time a high-quality chromosome-scale genome sequence from M. sinica, by combining Illumina and ONT data with Hi-C scaffolding methods. The final assembled genome size of M. sinica was 1.84 Gb, with a contig N50 of ca. 45 Mb and scaffold N50 of 92 Mb. Identified repeats constituted approximately 57% of the genome, and 43,473 protein-coding genes were predicted. Phylogenetic analysis shows that the magnolias form a sister clade with the eudicots and the order Ceratophyllales, while the monocots are sister to the other core angiosperms. In our study, a total of 21 individuals from the 5 remnant populations of M. sinica, as well as 22 specimens belonging to 8 related Magnoliaceae species, were resequenced. The results showed that M. sinica had higher genetic diversity (θw = 0.01126 and θπ = 0.01158) than other related species in the Magnoliaceae. However, population structure analysis suggested that the genetic differentiation among the 5 M. sinica populations was very low. Analyses of the demographic history of the species using different models consistently revealed that 2 bottleneck events occurred. The contemporary effective population size of M. sinica was estimated to be 10.9. The different patterns of genetic loads (inbreeding and numbers of deleterious mutations) suggested constructive strategies for the conservation of these 5 different populations of M. sinica. Overall, this high-quality genome will be a valuable genomic resource for conservation of M. sinica.
Collapse
Affiliation(s)
- Lei Cai
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Detuan Liu
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Fengmao Yang
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Rengang Zhang
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
- University of Chinese Academy of Sciences, 100049 Beijing, China
| | - Quanzheng Yun
- Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang, 261000, Shandong, China
| | - Zhiling Dao
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Yongpeng Ma
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Weibang Sun
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| |
Collapse
|
4
|
Al-Dossary O, Furtado A, KharabianMasouleh A, Alsubaie B, Al-Mssallem I, Henry RJ. Long read sequencing to reveal the full complexity of a plant transcriptome by targeting both standard and long workflows. PLANT METHODS 2023; 19:112. [PMID: 37865785 PMCID: PMC10589961 DOI: 10.1186/s13007-023-01091-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 10/13/2023] [Indexed: 10/23/2023]
Abstract
BACKGROUND Long read sequencing allows the analysis of full-length transcripts in plants without the challenges of reliable transcriptome assembly. Long read sequencing of transcripts from plant genomes has often utilized sized transcript libraries. However, the value of including libraries of differing sizes has not been established. METHODS A comprehensive transcriptome of the leaves of Jojoba (Simmondsia chinensis) was generated from two different PacBio library preparations: standard workflow (SW) and long workflow (LW). RESULTS The importance of using both transcript groups in the analysis was demonstrated by the high proportion of unique sequences (74.6%) that were not shared between the groups. A total of 37.8% longer transcripts were only detected in the long dataset. The completeness of the combined transcriptome was indicated by the presence of 98.7% of genes predicted in the jojoba male reference genome. The high coverage of the transcriptome was further confirmed by BUSCO analysis showing the presence of 96.9% of the genes from the core viridiplantae_odb10 lineage. The high-quality isoforms post Cd-Hit merged dataset of the two workflows had a total of 167,866 isoforms. Most of the transcript isoforms were protein-coding sequences (71.7%) containing open reading frames (ORFs) ≥ 100 amino acids (aa). Alternative splicing and intron retention were the basis of most transcript diversity when analysed at the whole genome level and by specific analysis of the apetala2 gene families. CONCLUSION This suggests the need to specifically target the capture of longer transcripts to provide more comprehensive genome coverage in plant transcriptome analysis and reveal the high level of alternative splicing.
Collapse
Affiliation(s)
- Othman Al-Dossary
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
- College of Agriculture and Food Sciences, King Faisal University, 36362, Al Hofuf, Saudi Arabia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
| | - Ardashir KharabianMasouleh
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
| | - Bader Alsubaie
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia
- College of Agriculture and Food Sciences, King Faisal University, 36362, Al Hofuf, Saudi Arabia
| | - Ibrahim Al-Mssallem
- College of Agriculture and Food Sciences, King Faisal University, 36362, Al Hofuf, Saudi Arabia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, 4072, Australia.
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Queensland, Brisbane, 4072, Australia.
| |
Collapse
|
5
|
Ren Y, Tseng E, Smith TPL, Hiendleder S, Williams JL, Low WY. Long read isoform sequencing reveals hidden transcriptional complexity between cattle subspecies. BMC Genomics 2023; 24:108. [PMID: 36915055 PMCID: PMC10012480 DOI: 10.1186/s12864-023-09212-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 02/27/2023] [Indexed: 03/16/2023] Open
Abstract
The Iso-Seq method of full-length cDNA sequencing is suitable to quantify differentially expressed genes (DEGs), transcripts (DETs) and transcript usage (DTU). However, the higher cost of Iso-Seq relative to RNA-seq has limited the comparison of both methods. Transcript abundance estimated by RNA-seq and deep Iso-Seq data for fetal liver from two cattle subspecies were compared to evaluate concordance. Inter-sample correlation of gene- and transcript-level abundance was higher within technology than between technologies. Identification of DEGs between the cattle subspecies depended on sequencing method with only 44 genes identified by both that included 6 novel genes annotated by Iso-Seq. There was a pronounced difference between Iso-Seq and RNA-seq results at transcript-level wherein Iso-Seq revealed several magnitudes more transcript abundance and usage differences between subspecies. Factors influencing DEG identification included size selection during Iso-Seq library preparation, average transcript abundance, multi-mapping of RNA-seq reads to the reference genome, and overlapping coordinates of genes. Some DEGs called by RNA-seq alone appear to be sequence duplication artifacts. Among the 44 DEGs identified by both technologies some play a role in immune system, thyroid function and cell growth. Iso-Seq revealed hidden transcriptional complexity in DEGs, DETs and DTU genes between cattle subspecies previously missed by RNA-seq.
Collapse
Affiliation(s)
- Yan Ren
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, Adelaide, SA, 5371, Australia
| | | | - Timothy P L Smith
- U.S. Meat Animal Research Center, USDA-ARS, Clay Center, Clay Center, Nebraska, USA
| | - Stefan Hiendleder
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, Adelaide, SA, 5371, Australia
- Robinson Research Institute, The University of Adelaide, North Adelaide, Adelaide, SA, 5006, Australia
| | - John L Williams
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, Adelaide, SA, 5371, Australia
- Department of Animal Science, Food and Nutrition, Università Cattolica del Sacro Cuore, 29122, Piacenza, Italy
| | - Wai Yee Low
- The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide, Roseworthy, Adelaide, SA, 5371, Australia.
| |
Collapse
|
6
|
Brouze A, Krawczyk PS, Dziembowski A, Mroczek S. Measuring the tail: Methods for poly(A) tail profiling. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1737. [PMID: 35617484 PMCID: PMC10078590 DOI: 10.1002/wrna.1737] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/13/2022] [Accepted: 04/15/2022] [Indexed: 01/31/2023]
Abstract
The 3'-end poly(A) tail is an important and potent feature of most mRNA molecules that affects mRNA fate and translation efficiency. Polyadenylation is a posttranscriptional process that occurs in the nucleus by canonical poly(A) polymerases (PAPs). In some specific instances, the poly(A) tail can also be extended in the cytoplasm by noncanonical poly(A) polymerases (ncPAPs). This epitranscriptomic regulation of mRNA recently became one of the most interesting aspects in the field. Advances in RNA sequencing technologies and software development have allowed the precise measurement of poly(A) tails, identification of new ncPAPs, expansion of the function of known enzymes, discovery and a better understanding of the physiological role of tail heterogeneity, and recognition of a correlation between tail length and RNA translatability. Here, we summarize the development of polyadenylation research methods, including classic low-throughput approaches, Illumina-based genome-wide analysis, and advanced state-of-art techniques that utilize long-read third-generation sequencing with Pacific Biosciences and Oxford Nanopore Technologies platforms. A boost in technical opportunities over recent decades has allowed a better understanding of the regulation of gene expression at the mRNA level. This article is categorized under: RNA Methods > RNA Analyses In Vitro and In Silico.
Collapse
Affiliation(s)
- Aleksandra Brouze
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Paweł Szczepan Krawczyk
- Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Andrzej Dziembowski
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland.,Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland.,Department of Embryology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Seweryn Mroczek
- Institute of Genetics and Biotechnology, Faculty of Biology, University of Warsaw, Warsaw, Poland.,Laboratory of RNA Biology, International Institute of Molecular and Cell Biology, Warsaw, Poland
| |
Collapse
|
7
|
Linscott TM, González-González A, Hirano T, Parent CE. De novo genome assembly and genome skims reveal LTRs dominate the genome of a limestone endemic Mountainsnail (Oreohelix idahoensis). BMC Genomics 2022; 23:796. [PMID: 36460988 PMCID: PMC9719178 DOI: 10.1186/s12864-022-09000-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 11/10/2022] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Calcareous outcrops, rocky areas composed of calcium carbonate (CaCO3), often host a diverse, specialized, and threatened biomineralizing fauna. Despite the repeated evolution of physiological and morphological adaptations to colonize these mineral rich substrates, there is a lack of genomic resources for calcareous rock endemic species. This has hampered our ability to understand the genomic mechanisms underlying calcareous rock specialization and manage these threatened species. RESULTS Here, we present a new draft genome assembly of the threatened limestone endemic land snail Oreohelix idahoensis and genome skim data for two other Oreohelix species. The O. idahoensis genome assembly (scaffold N50: 404.19 kb; 86.6% BUSCO genes) is the largest (~ 5.4 Gb) and most repetitive mollusc genome assembled to date (85.74% assembly size). The repetitive landscape was unusually dominated by an expansion of long terminal repeat (LTR) transposable elements (57.73% assembly size) which have shaped the evolution genome size, gene composition through retrotransposition of host genes, and ectopic recombination. Genome skims revealed repeat content is more than 2-3 fold higher in limestone endemic O. idahoensis compared to non-calcareous Oreohelix species. Gene family size analysis revealed stress and biomineralization genes have expanded significantly in the O. idahoensis genome. CONCLUSIONS Hundreds of threatened land snail species are endemic to calcareous rock regions but there are very few genomic resources available to guide their conservation or determine the genomic architecture underlying CaCO3 resource specialization. Our study provides one of the first high quality draft genomes of a calcareous rock endemic land snail which will serve as a foundation for the conservation genomics of this threatened species and for other groups. The high proportion and activity of LTRs in the O. idahoensis genome is unprecedented in molluscan genomics and sheds new light how transposable element content can vary across molluscs. The genomic resources reported here will enable further studies of the genomic mechanisms underlying calcareous rock specialization and the evolution of transposable element content across molluscs.
Collapse
Affiliation(s)
- T. Mason Linscott
- grid.266456.50000 0001 2284 9900Department of Biological Sciences, University of Idaho, Moscow, ID USA ,grid.266456.50000 0001 2284 9900Institute for Interdisciplinary Data Sciences, University of Idaho, Moscow, ID USA
| | - Andrea González-González
- grid.15276.370000 0004 1936 8091Department of Biology, University of Florida, Gainesville, Florida USA
| | - Takahiro Hirano
- grid.69566.3a0000 0001 2248 6943Center for Northeast Asian Studies, Tohoku University, Sendai, Miyagi Japan
| | - Christine E. Parent
- grid.266456.50000 0001 2284 9900Department of Biological Sciences, University of Idaho, Moscow, ID USA ,grid.266456.50000 0001 2284 9900Institute for Interdisciplinary Data Sciences, University of Idaho, Moscow, ID USA
| |
Collapse
|
8
|
Wang YW, Nambeesan SU. Full-length fruit transcriptomes of southern highbush (Vaccinium sp.) and rabbiteye (V. virgatum Ait.) blueberry. BMC Genomics 2022; 23:733. [DOI: 10.1186/s12864-022-08935-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 10/06/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Blueberries (Vaccinium sp.) are native to North America and breeding efforts to improve blueberry fruit quality are focused on improving traits such as increased firmness, enhanced flavor and greater shelf-life. Such efforts require additional genomic resources, especially in southern highbush and rabbiteye blueberries.
Results
We generated the first full-length fruit transcriptome for the southern highbush and rabbiteye blueberry using the cultivars, Suziblue and Powderblue, respectively. The transcriptome was generated using the Pacific Biosciences single-molecule long-read isoform sequencing platform with cDNA pooled from seven stages during fruit development and postharvest storage. Raw reads were processed through the Isoseq pipeline and full-length transcripts were mapped to the ‘Draper’ genome with unmapped reads collapsed using Cogent. Finally, we identified 16,299 and 15,882 non-redundant transcripts in ‘Suziblue’ and ‘Powderblue’ respectively by combining the reads mapped to Northern Highbush blueberry ‘Draper’ genome and Cogent analysis. In both cultivars, > 80% of sequences were longer than 1,000 nt, with the median transcript length around 1,700 nt. Functionally annotated transcripts using Blast2GO were > 92% in both ‘Suziblue’ and ‘Powderblue’ with overall equal distribution of gene ontology (GO) terms in the two cultivars. Analyses of alternative splicing events indicated that around 40% non-redundant sequences exhibited more than one isoform. Additionally, long non-coding RNAs were predicted to represent 5.6% and 7% of the transcriptomes in ‘Suziblue’ and ‘Powderblue’, respectively. Fruit ripening is regulated by several hormone-related genes and transcription factors. Among transcripts associated with phytohormone metabolism/signaling, the highest number of transcripts were related to abscisic acid (ABA) and auxin metabolism followed by those for brassinosteroid, jasmonic acid and ethylene metabolism. Among transcription factor-associated transcripts, those belonging to ripening-related APETALA2/ethylene-responsive element-binding factor (AP2/ERF), NAC (NAM, ATAF1/2 and CUC2), leucine zipper (HB-zip), basic helix-loop-helix (bHLH), MYB (v-MYB, discovered in avian myeloblastosis virus genome) and MADS-Box gene families, were abundant.
Further we measured three fruit ripening quality traits and indicators [ABA, and anthocyanin concentration, and texture] during fruit development and ripening. ABA concentration increased during the initial stages of fruit ripening and then declined at the Ripe stage, whereas anthocyanin content increased during the final stages of fruit ripening in both cultivars. Fruit firmness declined during ripening in ‘Powderblue’. Genes associated with the above parameters were identified using the full-length transcriptome. Transcript abundance patterns of these genes were consistent with changes in the fruit ripening and quality-related characteristics.
Conclusions
A full-length, well-annotated fruit transcriptome was generated for two blueberry species commonly cultivated in the southeastern United States. The robustness of the transcriptome was verified by the identification and expression analyses of multiple fruit ripening and quality–regulating genes. The full-length transcriptome is a valuable addition to the blueberry genomic resources and will aid in further improving the annotation. It will also provide a useful resource for the investigation of molecular aspects of ripening and postharvest processes.
Collapse
|
9
|
Ahmed YW, Alemu BA, Bekele SA, Gizaw ST, Zerihun MF, Wabalo EK, Teklemariam MD, Mihrete TK, Hanurry EY, Amogne TG, Gebrehiwot AD, Berga TN, Haile EA, Edo DO, Alemu BD. Epigenetic tumor heterogeneity in the era of single-cell profiling with nanopore sequencing. Clin Epigenetics 2022; 14:107. [PMID: 36030244 PMCID: PMC9419648 DOI: 10.1186/s13148-022-01323-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 08/12/2022] [Indexed: 11/29/2022] Open
Abstract
Nanopore sequencing has brought the technology to the next generation in the science of sequencing. This is achieved through research advancing on: pore efficiency, creating mechanisms to control DNA translocation, enhancing signal-to-noise ratio, and expanding to long-read ranges. Heterogeneity regarding epigenetics would be broad as mutations in the epigenome are sensitive to cause new challenges in cancer research. Epigenetic enzymes which catalyze DNA methylation and histone modification are dysregulated in cancer cells and cause numerous heterogeneous clones to evolve. Detection of this heterogeneity in these clones plays an indispensable role in the treatment of various cancer types. With single-cell profiling, the nanopore sequencing technology could provide a simple sequence at long reads and is expected to be used soon at the bedside or doctor's office. Here, we review the advancements of nanopore sequencing and its use in the detection of epigenetic heterogeneity in cancer.
Collapse
Affiliation(s)
- Yohannis Wondwosen Ahmed
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia.
| | - Berhan Ababaw Alemu
- Department of Medical Biochemistry, School of Medicine, St. Paul's Hospital, Millennium Medical College, Addis Ababa, Ethiopia
| | - Sisay Addisu Bekele
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Solomon Tebeje Gizaw
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Muluken Fekadie Zerihun
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Endriyas Kelta Wabalo
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Maria Degef Teklemariam
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Tsehayneh Kelemu Mihrete
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Endris Yibru Hanurry
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Tensae Gebru Amogne
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Assaye Desalegne Gebrehiwot
- Department of Medical Anatomy, School of Medicine, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| | - Tamirat Nida Berga
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Ebsitu Abate Haile
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Dessiet Oma Edo
- Department of Medical Biochemistry, School of Medicine, College of Health Sciences, Addis Ababa University, P.O. Box: 9086, Addis Ababa, Ethiopia
| | - Bizuwork Derebew Alemu
- Department of Statistics, College of Natural and Computational Sciences, Mizan Tepi University, Tepi, Ethiopia
| |
Collapse
|
10
|
Hedouin S, Logsdon GA, Underwood JG, Biggins S. A transcriptional roadblock protects yeast centromeres. Nucleic Acids Res 2022; 50:7801-7815. [PMID: 35253883 PMCID: PMC9371891 DOI: 10.1093/nar/gkac117] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 02/02/2022] [Accepted: 02/07/2022] [Indexed: 11/12/2022] Open
Abstract
Centromeres are the chromosomal loci essential for faithful chromosome segregation during cell division. Although centromeres are transcribed and produce non-coding RNAs (cenRNAs) that affect centromere function, we still lack a mechanistic understanding of how centromere transcription is regulated. Here, using a targeted RNA isoform sequencing approach, we identified the transcriptional landscape at and surrounding all centromeres in budding yeast. Overall, cenRNAs are derived from transcription readthrough of pericentromeric regions but rarely span the entire centromere and are a complex mixture of molecules that are heterogeneous in abundance, orientation, and sequence. While most pericentromeres are transcribed throughout the cell cycle, centromere accessibility to the transcription machinery is restricted to S-phase. This temporal restriction is dependent on Cbf1, a centromere-binding transcription factor, that we demonstrate acts locally as a transcriptional roadblock. Cbf1 deletion leads to an accumulation of cenRNAs at all phases of the cell cycle which correlates with increased chromosome mis-segregation that is partially rescued when the roadblock activity is restored. We propose that a Cbf1-mediated transcriptional roadblock protects yeast centromeres from untimely transcription to ensure genomic stability.
Collapse
Affiliation(s)
- Sabrine Hedouin
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Jason G Underwood
- Pacific Biosciences (PacBio) of California, Incorporated, Menlo Park, CA 94025, USA
| | - Sue Biggins
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| |
Collapse
|
11
|
Mo Y, Jiao Y. Advances and applications of single-cell omics technologies in plant research. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 110:1551-1563. [PMID: 35426954 DOI: 10.1111/tpj.15772] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 06/14/2023]
Abstract
Single-cell sequencing approaches reveal the intracellular dynamics of individual cells and answer biological questions with high-dimensional catalogs of millions of cells, including genomics, transcriptomics, chromatin accessibility, epigenomics, and proteomics data across species. These emerging yet thriving technologies have been fully embraced by the field of plant biology, with a constantly expanding portfolio of applications. Here, we introduce the current technical advances used for single-cell omics, especially single-cell genome and transcriptome sequencing. Firstly, we overview methods for protoplast and nucleus isolation and genome and transcriptome amplification. Subsequently, we use well-executed benchmarking studies to highlight advances made through the application of single-cell omics techniques. Looking forward, we offer a glimpse of additional hurdles and future opportunities that will introduce broad adoption of single-cell sequencing with revolutionary perspectives in plant biology.
Collapse
Affiliation(s)
- Yajin Mo
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Center for Quantitative Biology, School of Life Sciences, Peking University, Beijing, 100871, China
- School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yuling Jiao
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Center for Quantitative Biology, School of Life Sciences, Peking University, Beijing, 100871, China
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Beijing, 100101, China
| |
Collapse
|
12
|
Huang D, Liu W, Hu Q, Li H, Wang C. The Histone Acetyltransferase HpGCN5 Involved in the Regulation of Abiotic Stress Responses and Astaxanthin Accumulation in Haematococcus pluvialis. FRONTIERS IN PLANT SCIENCE 2022; 13:903764. [PMID: 35668806 PMCID: PMC9163953 DOI: 10.3389/fpls.2022.903764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 04/19/2022] [Indexed: 06/15/2023]
Abstract
The histone acetyltransferases (HATs), together with histone deacetylases, regulate the gene transcription related to various biological processes, including stress responses in eukaryotes. This study found a member of HATs (HpGCN5) from a transcriptome of the economically important microalgae Haematococcus pluvialis. Its expression pattern responding to multiple abiotic stresses and its correlation with transcription factors and genes involved in triacylglycerols and astaxanthin biosynthesis under stress conditions were evaluated, aiming to discover its potential biological function. The isolated HpGCN5 was 1,712 bp in length encoding 415 amino acids. The signature domains of Acetyltransf_1 and BROMO were presented, as the GCN5 gene from Arabidopsis and Saccharomyces cerevisiae, confirming that HpGCN5 belongs to the GCN5 subfamily of the GNAT superfamily. The phylogenetic analysis revealed that HpGCN5 is grouped with GNAT genes from algae and is closer to that from higher plants, compared with yeast, animal, fungus, and bacteria. It was predicted that HpGCN5 is composed of 10 exons and contains multiple stress-related cis-elements in the promoter region, revealing its potential role in stress regulation. Real-time quantitative PCR revealed that HpGCN5 responds to high light and high salt stresses in similar behavior, evidenced by their down-regulation exposing to stresses. Differently, HpGCN5 expression was significantly induced by SA and Nitrogen-depletion stresses at the early stage but was dropped back after then. The correlation network analysis suggested that HpGCN5 has a strong correlation with major genes and a transcription factor involved in astaxanthin biosynthesis. Besides, the correlation was only found between HpGCN5 and a few genes involved in triacylglycerols biosynthesis. Therefore, this study proposed that HpGCN5 might play a role in the regulation of astaxanthin biosynthesis. This study firstly examined the role of HATs in stress regulation and results will enrich our understanding of the role of HATs in microalgae.
Collapse
Affiliation(s)
- Danqiong Huang
- Shenzhen Key Laboratory of Marine Bioresource and Eco-Environmental Science, Shenzhen Engineering Laboratory for Marine Algal Biotechnology, Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
| | - Wenfu Liu
- Shenzhen Key Laboratory of Marine Bioresource and Eco-Environmental Science, Shenzhen Engineering Laboratory for Marine Algal Biotechnology, Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
| | - Qunju Hu
- Marine Resources Big Data Center of South China Sea, Southern Marine Science and Engineering Guangdong Laboratory, Zhanjiang, China
| | - Hui Li
- Shenzhen Key Laboratory of Marine Bioresource and Eco-Environmental Science, Shenzhen Engineering Laboratory for Marine Algal Biotechnology, Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
| | - Chaogang Wang
- Shenzhen Key Laboratory of Marine Bioresource and Eco-Environmental Science, Shenzhen Engineering Laboratory for Marine Algal Biotechnology, Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, China
| |
Collapse
|
13
|
Singh P, Ahi EP. The importance of alternative splicing in adaptive evolution. Mol Ecol 2022; 31:1928-1938. [DOI: 10.1111/mec.16377] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 01/06/2022] [Accepted: 01/25/2022] [Indexed: 11/26/2022]
Affiliation(s)
- Pooja Singh
- Department of Biological Sciences University of Calgary Calgary Canada
- Institute of Ecology and Evolution University of Bern Bern Switzerland
- Swiss Federal Institute of Aquatic Science and Technology (EAWAG) Kastanienbaum Switzerland
| | - Ehsan Pashay Ahi
- Organismal and Evolutionary Biology Research Programme University of Helsinki Helsinki Finland
| |
Collapse
|
14
|
Sun M, Zhao Y, Shao X, Ge J, Tang X, Zhu P, Wang J, Zhao T. EST-SSR Marker Development and Full-Length Transcriptome Sequence Analysis of Tiger Lily ( Lilium lancifolium Thunb). Appl Bionics Biomech 2022; 2022:7641048. [PMID: 35126662 PMCID: PMC8816598 DOI: 10.1155/2022/7641048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/03/2022] [Accepted: 01/12/2022] [Indexed: 12/11/2022] Open
Abstract
The fast advancement and deployment of sequencing technologies after the Human Genome Project have greatly increased our knowledge of the eukaryotic genome sequences. However, due to technological concerns, high-quality genomic data has been confined to a few key organisms. Moreover, our understanding of which portions of genomes make up genes and which transcript isoforms synthesize these genes is scarce. Therefore, the current study has been designed to explore the reliability of the tiger lily (Lilium lancifolium Thunb) transcriptome. The PacBio-SMRT was used for attaining the complete transcriptomic profile. We obtained a total of 815,624 CCS (Circular Consensus Sequence) reads with an average length of 1295 bp. The tiger lily transcriptome has been sequenced for the first time using third-generation long-read technology. Furthermore, unigenes (38,707), lncRNAs (6852), and TF members (768) were determined based on the transcriptome data, followed by evaluating SSRs (3319). It has also been revealed that 105 out of 128 primer pairs effectively amplified PCR products. Around 15,608 transcripts were allocated to 25 distinct KOG Clusters, and 10,706 unigenes were grouped into 52 functional categories in the annotated transcripts. Until now, no tiger lily lncRNAs have been discovered. Results of this study may serve as an extensive set of reference transcripts and help us learn more about the transcriptomes of tiger lilies and pave the path for further research.
Collapse
Affiliation(s)
- Mingwei Sun
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| | | | - Xiaobin Shao
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| | - Jintao Ge
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| | - Xueyan Tang
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| | - Pengbo Zhu
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| | - Jiangying Wang
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| | - Tongli Zhao
- Lianyungang Academy of Agricultural Sciences, Lianyungang, China
| |
Collapse
|
15
|
Qu Z, Jia Y, Duan Y, Chen H, Wang X, Zheng H, Liu H, Wang J, Zou D, Zhao H. Integrated Isoform Sequencing and Dynamic Transcriptome Analysis Reveals Diverse Transcripts Responsible for Low Temperature Stress at Anther Meiosis Stage in Rice. FRONTIERS IN PLANT SCIENCE 2021; 12:795834. [PMID: 34975985 PMCID: PMC8718874 DOI: 10.3389/fpls.2021.795834] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 11/30/2021] [Indexed: 06/14/2023]
Abstract
Low temperatures stress is one of the important factors limiting rice yield, especially during rice anther development, and can cause pollen sterility and decrease grain yield. In our study, low-temperature stress decreased pollen viability and spikelet fertility by affecting the sugar, nitrogen and amino acid contents of anthers. We performed RNA-seq and ISO-seq experiments to study the genome-wide transcript expression profiles in low-temperature anthers. A total of 4,859 differentially expressed transcripts were detected between the low-temperature and control groups. Gene ontology enrichment analysis revealed significant terms related to cold tolerance. Hexokinase and glutamate decarboxylase participating in starch and sucrose metabolism may play important roles in the response to cold stress. Using weighted gene co-expression network analysis, nine hub transcripts were found that could improve cold tolerance throughout the meiosis period of rice: Os02t0219000-01 (interferon-related developmental regulator protein), Os01t0218350-00 (tetratricopeptide repeat-containing thioredoxin), Os08t0197700-00 (luminal-binding protein 5), Os11t0200000-01 (histone deacetylase 19), Os03t0758700-01 (WD40 repeat domain-containing protein), Os06t0220500-01 (7-deoxyloganetin glucosyltransferase), Pacbio.T01382 (sucrose synthase 1), Os01t0172400-01 (phospholipase D alpha 1), and Os01t0261200-01 (NAC domain-containing protein 74). In the PPI network, the protein minichromosome maintenance 4 (MCM4) may play an important role in DNA replication induced by cold stress.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Hongwei Zhao
- Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Ministry of Education, Northeast Agricultural University, Harbin, China
| |
Collapse
|
16
|
Wang Y, Li X, Wang C, Gao L, Wu Y, Ni X, Sun J, Jiang J. Unveiling the transcriptomic complexity of Miscanthus sinensis using a combination of PacBio long read- and Illumina short read sequencing platforms. BMC Genomics 2021; 22:690. [PMID: 34551715 PMCID: PMC8459517 DOI: 10.1186/s12864-021-07971-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 09/03/2021] [Indexed: 11/10/2022] Open
Abstract
Background Miscanthus sinensis Andersson is a perennial grass that exhibits remarkable lignocellulose characteristics suitable for sustainable bioenergy production. However, knowledge of the genetic resources of this species is relatively limited, which considerably hampers further work on its biology and genetic improvement. Results In this study, through analyzing the transcriptome of mixed samples of leaves and stems using the latest PacBio Iso-Seq sequencing technology combined with Illumina HiSeq, we report the first full-length transcriptome dataset of M. sinensis with a total of 58.21 Gb clean data. An average of 15.75 Gb clean reads of each sample were obtained from the PacBio Iso-Seq system, which doubled the data size (6.68 Gb) obtained from the Illumina HiSeq platform. The integrated analyses of PacBio- and Illumina-based transcriptomic data uncovered 408,801 non-redundant transcripts with an average length of 1,685 bp. Of those, 189,406 transcripts were commonly identified by both methods, 169,149 transcripts with an average length of 619 bp were uniquely identified by Illumina HiSeq, and 51,246 transcripts with an average length of 2,535 bp were uniquely identified by PacBio Iso-Seq. Approximately 96 % of the final combined transcripts were mapped back to the Miscanthus genome, reflecting the high quality and coverage of our sequencing results. When comparing our data with genomes of four species of Andropogoneae, M. sinensis showed the closest relationship with sugarcane with up to 93 % mapping ratios, followed by sorghum with up to 80 % mapping ratios, indicating a high conservation of orthologs in these three genomes. Furthermore, 306,228 transcripts were successfully annotated against public databases including cell wall related genes and transcript factor families, thus providing many new insights into gene functions. The PacBio Iso-Seq data also helped identify 3,898 alternative splicing events and 2,963 annotated AS isoforms within 10 function categories. Conclusions Taken together, the present study provides a rich data set of full-length transcripts that greatly enriches our understanding of M. sinensis transcriptomic resources, thus facilitating further genetic improvement and molecular studies of the Miscanthus species. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07971-x.
Collapse
Affiliation(s)
- Yongli Wang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Xia Li
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Congsheng Wang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Lu Gao
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Yanfang Wu
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Xingnan Ni
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China
| | - Jianzhong Sun
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China.
| | - Jianxiong Jiang
- Biofuels Institute, School of the Environment and Safety Engineering, Jiangsu University, 212013, Zhenjiang, Jiangsu, China.
| |
Collapse
|
17
|
Ma H, Liu Y, Liu D, Sun W, Liu X, Wan Y, Zhang X, Zhang R, Yun Q, Wang J, Li Z, Ma Y. Chromosome-level genome assembly and population genetic analysis of a critically endangered rhododendron provide insights into its conservation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 107:1533-1545. [PMID: 34189793 DOI: 10.1111/tpj.15399] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 06/23/2021] [Indexed: 05/25/2023]
Abstract
Rhododendrons are woody plants, famous throughout the world as having high horticultural value. However, many wild species are currently threatened with extinction. Here, we report for the first time a high-quality, chromosome-level genome of Rhododendron griersonianum, which has contributed to approximately 10% of all horticultural rhododendron varieties but which in its wild form has been evaluated as critically endangered. The final genome assembly, which has a contig N50 size of approximately 34 M and a total length of 677 M, is the highest-quality genome sequenced within the genus to date, in part due to its low heterozygosity (0.18%). Identified repeats constitute approximately 57% of the genome, and 38 280 protein-coding genes were predicted with high support. We further resequenced 31 individuals of R. griersonianum as well as 30 individuals of its widespread relative R. delavayi, and performed additional conservation genomic analysis. The results showed that R. griersonianum had lower genetic diversity (θ = 2.58e-3; π = 1.94e-3) when compared not only to R. delavayi (θ = 11.61e-3, π = 12.97e-3), but also to most other woody plants. Furthermore, three severe genetic bottlenecks were detected using both the Stairway plot and fastsimcoal2 analysis, which are thought to have occurred in the late Middle Pleistocene and the Last Glacial Maximum (LGM) period. After these bottlenecks, R. griersonianum recovered and maintained a constant effective population size (>25 000) until now. Intriguingly, R. griersonianum has accumulated significantly more deleterious mutations in the homozygous state than R. delavayi, and several deleterious mutations (e.g., in genes involved in the response to heat stress) are likely to have harmed the adaptation of this plant to its surroundings. This high-quality, chromosome-level genome and the population genomic analysis of the critically endangered R. griersonianum will provide an invaluable resource as well as insights for future study in this species to facilitate conservation and in the genus Rhododendron in general.
Collapse
Affiliation(s)
- Hong Ma
- Research Institute of Resources Insects, Chinese Academy of Forestry, Kunming, 650233, China
| | - Yongbo Liu
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing, 100012, China
| | - Detuan Liu
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Weibang Sun
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Xiongfang Liu
- Research Institute of Resources Insects, Chinese Academy of Forestry, Kunming, 650233, China
| | - Youming Wan
- Research Institute of Resources Insects, Chinese Academy of Forestry, Kunming, 650233, China
| | - Xiujiao Zhang
- Research Institute of Resources Insects, Chinese Academy of Forestry, Kunming, 650233, China
| | - Rengang Zhang
- Beijing Ori-Gene Science and Technology Co. Ltd, Beijing, 102206, China
| | - Quanzheng Yun
- Beijing Ori-Gene Science and Technology Co. Ltd, Beijing, 102206, China
| | - Jihua Wang
- The Flower Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, 650205, China
- National Engineering Research Center for Ornamental Horticulture, Kunming, 650205, China
| | - Zhenghong Li
- Research Institute of Resources Insects, Chinese Academy of Forestry, Kunming, 650233, China
| | - Yongpeng Ma
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| |
Collapse
|
18
|
De Paoli-Iseppi R, Gleeson J, Clark MB. Isoform Age - Splice Isoform Profiling Using Long-Read Technologies. Front Mol Biosci 2021; 8:711733. [PMID: 34409069 PMCID: PMC8364947 DOI: 10.3389/fmolb.2021.711733] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 07/19/2021] [Indexed: 01/12/2023] Open
Abstract
Alternative splicing (AS) of RNA is a key mechanism that results in the expression of multiple transcript isoforms from single genes and leads to an increase in the complexity of both the transcriptome and proteome. Regulation of AS is critical for the correct functioning of many biological pathways, while disruption of AS can be directly pathogenic in diseases such as cancer or cause risk for complex disorders. Current short-read sequencing technologies achieve high read depth but are limited in their ability to resolve complex isoforms. In this review we examine how long-read sequencing (LRS) technologies can address this challenge by covering the entire RNA sequence in a single read and thereby distinguish isoform changes that could impact RNA regulation or protein function. Coupling LRS with technologies such as single cell sequencing, targeted sequencing and spatial transcriptomics is producing a rapidly expanding suite of technological approaches to profile alternative splicing at the isoform level with unprecedented detail. In addition, integrating LRS with genotype now allows the impact of genetic variation on isoform expression to be determined. Recent results demonstrate the potential of these techniques to elucidate the landscape of splicing, including in tissues such as the brain where AS is particularly prevalent. Finally, we also discuss how AS can impact protein function, potentially leading to novel therapeutic targets for a range of diseases.
Collapse
Affiliation(s)
| | | | - Michael B. Clark
- Centre for Stem Cell Systems, Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
19
|
Zhao Z, Elsik CG, E Hibbard B, S Shelby K. Detection of alternative splicing in western corn rootworm (Diabrotica virgifera virgifera LeConte) in association with eCry3.1Ab resistance using RNA-seq and PacBio Iso-Seq. INSECT MOLECULAR BIOLOGY 2021; 30:436-445. [PMID: 33955085 DOI: 10.1111/imb.12709] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 05/01/2021] [Indexed: 06/12/2023]
Abstract
Alternative splicing is a common feature in eukaryotes that not only increases the transcript diversity, but also has functional consequences. In insects, alternative splicing has been found associated with resistance to pesticides and Bt toxins. Up to date, the alternative splicing in western corn rootworm (Diabrotica virgifera virgifera LeConte) has not been studied. To investigate its alternative splicing pattern and relation to Bt resistance, we carried out single-molecule real-time (SMRT) transcript sequencing and Iso-seq analysis on resistant, eCry3.1Ab-selected and susceptible, unselected, western corn rootworm neonate midguts which fed on seedling maize with and without eCry3.1Ab for 12 and 24 h. We present transcriptome-wide alternative splicing patterns of western corn rootworm midgut in response to feeding on eCry3.1Ab-expressing corn using a comprehensive approach that combines both RNA-seq and SMRT transcript sequencing techniques. The results showed genes in western corn rootworm are highly alternatively spliced, which happens on 67.73% of multi-exon genes. One of the alternative splicing events we identified was a novel peritrophic matrix protein with two alternative splicing isoforms. Analysis of differential exon usage between resistant and susceptible colonies showed that in eCry3.1Ab-resistant western corn rootworm, expression of one isoform was significantly higher than in the susceptible colony, while no significant differences between colonies were observed with the other isoform. Our results provide the first survey of alternative splicing in western corn rootworm and suggest that the observed alternatively spliced isoforms of peritrophic matrix protein may be associated with eCry3.1Ab resistance in western corn rootworm.
Collapse
Affiliation(s)
- Z Zhao
- Division of Plant Sciences, University of Missouri, Columbia, MO, USA
| | - C G Elsik
- Division of Plant Sciences, University of Missouri, Columbia, MO, USA
- Division of Animal Sciences, University of Missouri, Columbia, MO, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - B E Hibbard
- Division of Plant Sciences, University of Missouri, Columbia, MO, USA
- USDA-ARS Plant Genetics Research Unit, Columbia, MO, USA
| | - K S Shelby
- Division of Plant Sciences, University of Missouri, Columbia, MO, USA
- USDA-ARS Biological Control of Insects Research Laboratory, Columbia, MO, USA
| |
Collapse
|
20
|
Shi X, Neuwald AF, Wang X, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles. Bioinformatics 2021; 37:650-658. [PMID: 33016988 PMCID: PMC8097681 DOI: 10.1093/bioinformatics/btaa852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 08/27/2020] [Accepted: 09/21/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. RESULTS We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance. AVAILABILITY AND IMPLEMENTATION The IntAPT package is available at http://github.com/henryxushi/IntAPT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xu Shi
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Xiao Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | | | - Robert Clarke
- Hormel Institute, University of Minnesota, 801 16th Ave NE, Austin, MN 55912, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
21
|
Global transcriptome changes of elongating internode of sugarcane in response to mepiquat chloride. BMC Genomics 2021; 22:79. [PMID: 33494722 PMCID: PMC7831198 DOI: 10.1186/s12864-020-07352-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 12/27/2020] [Indexed: 11/10/2022] Open
Abstract
Background Mepiquat chloride (DPC) is a chemical that is extensively used to control internode growth and create compact canopies in cultured plants. Previous studies have suggested that DPC could also inhibit gibberellin biosynthesis in sugarcane. Unfortunately, the molecular mechanism underlying the suppressive effects of DPC on plant growth is still largely unknown. Results In the present study, we first obtained high-quality long transcripts from the internodes of sugarcane using the PacBio Sequel System. A total of 72,671 isoforms, with N50 at 3073, were generated. These long isoforms were used as a reference for the subsequent RNA-seq. Afterwards, short reads generated from the Illumina HiSeq 4000 platform were used to compare the differentially expressed genes in both the DPC and the control groups. Transcriptome profiling showed that most significant gene changes occurred after six days post DPC treatment. These genes were related to plant hormone signal transduction and biosynthesis of several metabolites, indicating that DPC affected multiple pathways, in addition to suppressing gibberellin biosynthesis. The network of DPC on the key stage was illustrated by weighted gene co-expression network analysis (WGCNA). Among the 36 constructed modules, the top positive correlated module, at the stage of six days post spraying DPC, was sienna3. Notably, Stf0 sulfotransferase, cyclin-like F-box, and HOX12 were the hub genes in sienna3 that had high correlation with other genes in this module. Furthermore, the qPCR validated the high accuracy of the RNA-seq results. Conclusion Taken together, we have demonstrated the key role of these genes in DPC-induced growth inhibition in sugarcane. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-020-07352-w.
Collapse
|
22
|
Teterina AA, Willis JH, Phillips PC. Chromosome-Level Assembly of the Caenorhabditis remanei Genome Reveals Conserved Patterns of Nematode Genome Organization. Genetics 2020; 214:769-780. [PMID: 32111628 PMCID: PMC7153949 DOI: 10.1534/genetics.119.303018] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 02/24/2020] [Indexed: 12/23/2022] Open
Abstract
The nematode Caenorhabditis elegans is one of the key model systems in biology, including possessing the first fully assembled animal genome. Whereas C. elegans is a self-reproducing hermaphrodite with fairly limited within-population variation, its relative C. remanei is an outcrossing species with much more extensive genetic variation, making it an ideal parallel model system for evolutionary genetic investigations. Here, we greatly improve on previous assemblies by generating a chromosome-level assembly of the entire C. remanei genome (124.8 Mb of total size) using long-read sequencing and chromatin conformation capture data. Like other fully assembled genomes in the genus, we find that the C. remanei genome displays a high degree of synteny with C. elegans despite multiple within-chromosome rearrangements. Both genomes have high gene density in central regions of chromosomes relative to chromosome ends and the opposite pattern for the accumulation of repetitive elements. C. elegans and C. remanei also show similar patterns of interchromosome interactions, with the central regions of chromosomes appearing to interact with one another more than the distal ends. The new C. remanei genome presented here greatly augments the use of the Caenorhabditis as a platform for comparative genomics and serves as a basis for molecular population genetics within this highly diverse species.
Collapse
Affiliation(s)
- Anastasia A Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
- Center of Parasitology, A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow 117071, Russia
| | - John H Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
| | - Patrick C Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
| |
Collapse
|
23
|
Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:30. [PMID: 32033565 PMCID: PMC7006217 DOI: 10.1186/s13059-020-1935-5] [Citation(s) in RCA: 759] [Impact Index Per Article: 189.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 01/15/2020] [Indexed: 12/11/2022] Open
Abstract
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Collapse
Affiliation(s)
- Shanika L. Amarasinghe
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Luke Zappia
- Bioinformatics, Murdoch Children’s Research Institute, Parkville, 3052 Australia
- School of Biosciences, Faculty of Science, The University of Melbourne, Parkville, 3010 Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
- School of Mathematics and StatisticsThe University of Melbourne, Parkville, 3010 Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| |
Collapse
|
24
|
Carvalho DS, Nishimwe AV, Schnable JC. IsoSeq transcriptome assembly of C 3 panicoid grasses provides tools to study evolutionary change in the Panicoideae. PLANT DIRECT 2020; 4:e00203. [PMID: 32128472 PMCID: PMC7047018 DOI: 10.1002/pld3.203] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 01/14/2020] [Accepted: 01/16/2020] [Indexed: 06/10/2023]
Abstract
The number of plant species with genomic and transcriptomic data has been increasing rapidly. The grasses-Poaceae-have been well represented among species with published reference genomes. However, as a result the genomes of wild grasses are less frequently targeted by sequencing efforts. Sequence data from wild relatives of crop species in the grasses can aid the study of domestication, gene discovery for breeding and crop improvement, and improve our understanding of the evolution of C4 photosynthesis. Here, we used long-read sequencing technology to characterize the transcriptomes of three C3 panicoid grass species: Dichanthelium oligosanthes, Chasmanthium laxum, and Hymenachne amplexicaulis. Based on alignments to the sorghum genome, we estimate that assembled consensus transcripts from each species capture between 54.2% and 65.7% of the conserved syntenic gene space in grasses. Genes co-opted into C4 were also well represented in this dataset, despite concerns that because these genes might play roles unrelated to photosynthesis in the target species, they would be expressed at low levels and missed by transcript-based sequencing. A combined analysis using syntenic orthologous genes from grasses with published reference genomes and consensus long-read sequences from these wild species was consistent with previously published phylogenies. It is hoped that these data, targeting underrepresented classes of species within the PACMAD grasses-wild species and species utilizing C3 photosynthesis-will aid in future studies of domestication and C4 evolution by decreasing the evolutionary distance between C4 and C3 species within this clade, enabling more accurate comparisons associated with evolution of the C4 pathway.
Collapse
Affiliation(s)
- Daniel S. Carvalho
- Department of Agronomy and HorticultureCenter for Plant Science InnovationUniversity of Nebraska‐LincolnLincolnNEUSA
| | - Aime V. Nishimwe
- Department of Agronomy and HorticultureCenter for Plant Science InnovationUniversity of Nebraska‐LincolnLincolnNEUSA
| | - James C. Schnable
- Department of Agronomy and HorticultureCenter for Plant Science InnovationUniversity of Nebraska‐LincolnLincolnNEUSA
| |
Collapse
|
25
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
26
|
Detection of Abrin-Like and Prepropulchellin-Like Toxin Genes and Transcripts Using Whole Genome Sequencing and Full-Length Transcript Sequencing of Abrus precatorius. Toxins (Basel) 2019; 11:toxins11120691. [PMID: 31775284 PMCID: PMC6950105 DOI: 10.3390/toxins11120691] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/13/2019] [Accepted: 11/19/2019] [Indexed: 11/21/2022] Open
Abstract
The sequenced genome and the leaf transcriptome of a near relative of Abrus pulchellus and Abrus precatorius was analyzed to characterize the genetic basis of toxin gene expression. From the high-quality genome assembly, a total of 26 potential coding regions were identified that contain genes with abrin-like, pulchellin-like, and agglutinin-like homology, with full-length transcripts detected in leaf tissue for 9 of the 26 coding regions. All of the toxin-like genes were identified within only five isolated regions of the genome, with each region containing 1 to 16 gene variants within each genomic region (<1 Mbp). The Abrusprecatorius cultivar sequenced here contains genes which encode for proteins that are homologous to certain abrin and prepropulchellin genes previously identified, and we observed substantial diversity of genes and predicted gene products in Abrus precatorius and previously characterized toxins. This suggests diverse toxin repertoires within Abrus, potentially the results of rapid toxin evolution.
Collapse
|
27
|
Balázs Z, Tombácz D, Csabai Z, Moldován N, Snyder M, Boldogkői Z. Template-switching artifacts resemble alternative polyadenylation. BMC Genomics 2019; 20:824. [PMID: 31703623 PMCID: PMC6839120 DOI: 10.1186/s12864-019-6199-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 10/17/2019] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming. RESULTS Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a filtering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when filtering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming filters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads. CONCLUSIONS Our findings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous filtering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.
Collapse
Affiliation(s)
- Zsolt Balázs
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary.,Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Norbert Moldován
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
| | - Michael Snyder
- Department of Genetics, School of Medicine, Stanford University, Stanford, CA, USA
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary.
| |
Collapse
|
28
|
Soneson C, Yao Y, Bratus-Neuenschwander A, Patrignani A, Robinson MD, Hussain S. A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat Commun 2019; 10:3359. [PMID: 31366910 PMCID: PMC6668388 DOI: 10.1038/s41467-019-11272-z] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/04/2019] [Indexed: 11/29/2022] Open
Abstract
A platform for highly parallel direct sequencing of native RNA strands was recently described by Oxford Nanopore Technologies, but despite initial efforts it remains crucial to further investigate the technology for quantification of complex transcriptomes. Here we undertake native RNA sequencing of polyA + RNA from two human cell lines, analysing ~5.2 million aligned native RNA reads. To enable informative comparisons, we also perform relevant ONT direct cDNA- and Illumina-sequencing. We find that while native RNA sequencing does enable some of the anticipated advantages, key unexpected aspects currently hamper its performance, most notably the quite frequent inability to obtain full-length transcripts from single reads, as well as difficulties to unambiguously infer their true transcript of origin. While characterising issues that need to be addressed when investigating more complex transcriptomes, our study highlights that with some defined improvements, native RNA sequencing could be an important addition to the mammalian transcriptomics toolbox.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 8057, Zurich, Switzerland.
- Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
| | - Yao Yao
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 8057, Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Centre Zurich, ETHZ/University of Zurich, 8057, Zurich, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 8057, Zurich, Switzerland.
| | - Shobbir Hussain
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| |
Collapse
|
29
|
Mohindra V, Dangi T, Chowdhury LM, Jena JK. Tissue specific alpha-2-Macroglobulin (A2M) splice isoform diversity in Hilsa shad, Tenualosa ilisha (Hamilton, 1822). PLoS One 2019; 14:e0216144. [PMID: 31335900 PMCID: PMC6650032 DOI: 10.1371/journal.pone.0216144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Accepted: 04/15/2019] [Indexed: 12/12/2022] Open
Abstract
The present study, for the first time, reported twelve A2M isoforms in Tenualosa ilisha, through SMRT sequencing. Hilsa shad, T. ilisha, an anadromous fish, faces environmental stresses and is thus prone to diseases. Here, expression profiles of different A2M isoforms in four tissues were studied in T. ilisha, for the tissue specific diversity of A2M. Large scale high quality full length transcripts (>0.99% accuracy) were obtained from liver, ovary, testes and gill transcriptomes, through Iso-sequencing on PacBio RSII. A total of 12 isoforms, with complete putatative proteins, were detected in three tissues (7 isoforms in liver, 4 in ovary and 1 in testes). Complete structure of A2M mRNA was predicted from these isoforms, containing 4680 bp sequence, 35 exons and 1508 amino acids. With Homo sapiens A2M as reference, six functional domains (A2M_N,A2M_N2, A2M, Thiol-ester_cl, Complement and Receptor domain), along with a bait region, were predicted in A2M consensus protein. A total of 35 splice sites were identified in T. ilisha A2M consensus transcript, with highest frequency (55.7%) of GT-AG splice sites, as compared to that of Homo sapiens. Liver showed longest isoform (X1) consisting of all domains, while smallest (X10) was found in ovary with one Receptor domain. Present study predicted five putative markers (I-212, I-269, A-472, S-567 and Y-906) for EUS disease resistance in A2M protein, which were present in MG2 domains (A2M_N and A2M_N2), by comparing with that of resistant and susceptible/unknown response species. These markers classified fishes into two groups, resistant and susceptible response. Potential markers, predicted in T. ilisha, placed it to be EUS susceptible category. Putative markers reported in A2M protein may serve as molecular markers in diagnosis of EUS disease resistance/susceptibility in fishes and may have a potential for inclusion in the marker panel for pilot studies. Further, challenging studies are required to confirm the role of particular A2M isoforms and markers identified in immune protection against EUS disease.
Collapse
Affiliation(s)
- Vindhya Mohindra
- ICAR-National Bureau of Fish Genetic Resources (ICAR-NBFGR), Lucknow, India
- * E-mail: ,
| | - Tanushree Dangi
- ICAR-National Bureau of Fish Genetic Resources (ICAR-NBFGR), Lucknow, India
| | | | - J. K. Jena
- Indian Council of Agricultural Research (ICAR), Krishi Anusandhan Bhawan—II, New Delhi, India
| |
Collapse
|
30
|
Van den Berge K, Hembach KM, Soneson C, Tiberi S, Clement L, Love MI, Patro R, Robinson MD. RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021255] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.
Collapse
Affiliation(s)
- Koen Van den Berge
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Katharina M. Hembach
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Simone Tiberi
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| | - Lieven Clement
- Bioinformatics Institute Ghent and Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000 Ghent, Belgium
| | - Michael I. Love
- Department of Biostatistics and Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27514, USA
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794, USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
31
|
Yang Z, Ge X, Yang Z, Qin W, Sun G, Wang Z, Li Z, Liu J, Wu J, Wang Y, Lu L, Wang P, Mo H, Zhang X, Li F. Extensive intraspecific gene order and gene structural variations in upland cotton cultivars. Nat Commun 2019; 10:2989. [PMID: 31278252 PMCID: PMC6611876 DOI: 10.1038/s41467-019-10820-x] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 06/03/2019] [Indexed: 01/28/2023] Open
Abstract
Multiple cotton genomes (diploid and tetraploid) have been assembled. However, genomic variations between cultivars of allotetraploid upland cotton (Gossypium hirsutum L.), the most widely planted cotton species in the world, remain unexplored. Here, we use single-molecule long read and Hi-C sequencing technologies to assemble genomes of the two upland cotton cultivars TM-1 and zhongmiansuo24 (ZM24). Comparisons among TM-1 and ZM24 assemblies and the genomes of the diploid ancestors reveal a large amount of genetic variations. Among them, the top three longest structural variations are located on chromosome A08 of the tetraploid upland cotton, which account for ~30% total length of this chromosome. Haplotype analyses of the mapping population derived from these two cultivars and the germplasm panel show suppressed recombination rates in this region. This study provides additional genomic resources for the community, and the identified genetic variations, especially the reduced meiotic recombination on chromosome A08, will help future breeding.
Collapse
Affiliation(s)
- Zhaoen Yang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Xiaoyang Ge
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Zuoren Yang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Wenqiang Qin
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Gaofei Sun
- Anyang Institute of Technology, Anyang, 455000, China
| | - Zhi Wang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Zhi Li
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Ji Liu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Jie Wu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Ye Wang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Lili Lu
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Peng Wang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Huijuan Mo
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Xueyan Zhang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Fuguang Li
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China.
- Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, 455000, China.
| |
Collapse
|
32
|
Zhao L, Zhang H, Kohnen MV, Prasad KVSK, Gu L, Reddy ASN. Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing. Front Genet 2019; 10:253. [PMID: 30949200 PMCID: PMC6438080 DOI: 10.3389/fgene.2019.00253] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 03/06/2019] [Indexed: 12/18/2022] Open
Abstract
Nanopore sequencing from Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) single-molecule real-time (SMRT) long-read isoform sequencing (Iso-Seq) are revolutionizing the way transcriptomes are analyzed. These methods offer many advantages over most widely used high-throughput short-read RNA sequencing (RNA-Seq) approaches and allow a comprehensive analysis of transcriptomes in identifying full-length splice isoforms and several other post-transcriptional events. In addition, direct RNA-Seq provides valuable information about RNA modifications, which are lost during the PCR amplification step in other methods. Here, we present a comprehensive summary of important applications of these technologies in plants, including identification of complex alternative splicing (AS), full-length splice variants, fusion transcripts, and alternative polyadenylation (APA) events. Furthermore, we discuss the impact of the newly developed nanopore direct RNA-Seq in advancing epitranscriptome research in plants. Additionally, we summarize computational tools for identifying and quantifying full-length isoforms and other co/post-transcriptional events and discussed some of the limitations with these methods. Sequencing of transcriptomes using these new single-molecule long-read methods will unravel many aspects of transcriptome complexity in unprecedented ways as compared to previous short-read sequencing approaches. Analysis of plant transcriptomes with these new powerful methods that require minimum sample processing is likely to become the norm and is expected to uncover novel co/post-transcriptional gene regulatory mechanisms that control biological outcomes during plant development and in response to various stresses.
Collapse
Affiliation(s)
- Liangzhen Zhao
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hangxiao Zhang
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Markus V. Kohnen
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Kasavajhala V. S. K. Prasad
- Program in Cell and Molecular Biology, Department of Biology, Colorado State University, Fort Collins, CO, United States
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Anireddy S. N. Reddy
- Program in Cell and Molecular Biology, Department of Biology, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
33
|
Payá-Milans M, Olmstead JW, Nunez G, Rinehart TA, Staton M. Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species. Gigascience 2018; 7:5168871. [PMID: 30418578 PMCID: PMC6275443 DOI: 10.1093/gigascience/giy132] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 10/21/2018] [Indexed: 11/12/2022] Open
Abstract
Background The usual analysis of RNA sequencing (RNA-seq) reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-seq analysis workflow on the recovery of useful processed data available for downstream analysis. Findings At the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads could be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality, and completeness of de novo assemblies and with number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP. Conclusions The selection of bioinformatic software tools for RNA-seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.
Collapse
Affiliation(s)
- Miriam Payá-Milans
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| | - James W Olmstead
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Gerardo Nunez
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Timothy A Rinehart
- Thad Cochran Southern Horticultural Laboratory, USDA-Agricultural Research Service, PO Box 287, Poplarville, MS, 39470, United States.,Crop Production and Protection, USDA-Agricultural Research Service, 5601 Sunnyside Ave, Beltsville, MD, 20705, United States
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| |
Collapse
|
34
|
Cheng B, Furtado A, Henry RJ. Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts. Gigascience 2018; 6:1-13. [PMID: 29048540 PMCID: PMC5737654 DOI: 10.1093/gigascience/gix086] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2017] [Accepted: 08/23/2017] [Indexed: 11/20/2022] Open
Abstract
Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee.
Collapse
Affiliation(s)
- Bing Cheng
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
35
|
Bayega A, Fahiminiya S, Oikonomopoulos S, Ragoussis J. Current and Future Methods for mRNA Analysis: A Drive Toward Single Molecule Sequencing. Methods Mol Biol 2018; 1783:209-241. [PMID: 29767365 DOI: 10.1007/978-1-4939-7834-2_11] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The transcriptome encompasses a range of species including messenger RNA, and other noncoding RNA such as rRNA, tRNA, and short and long noncoding RNAs. Due to the huge role played by mRNA in development and disease, several methods have been developed to sequence and characterize mRNA, with RNA sequencing (RNA-Seq) emerging as the current method of choice particularly for large high-throughput studies. Short-read RNA-Seq which involves sequencing of short cDNA fragments and computationally assembling them to reconstruct the transcriptome, or aligning them to a reference is the most widely used approach. However, due to inherent limitations of this approach in de novo transcriptome assembly and isoform quantification, long-read RNA-Seq approaches, which also happen to be single molecule sequencing approaches, are increasingly becoming the standard for de novo transcriptome assembly and isoform quantification. In this chapter, we review the technical aspects of the current methods of RNA-Seq, both short and long-read approaches, and data analysis methods available. We discuss recent advances in single-cell RNA-Seq and direct RNA-Seq approaches, which perhaps will dominate the future of RNA-Seq.
Collapse
Affiliation(s)
- Anthony Bayega
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | | | - Spyros Oikonomopoulos
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- McGill University and Genome Quebec Innovation Centre, Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Department of Bioengineering, McGill University, Montréal, QC, Canada.
- Cancer and Mutagen Unit, Department of Biochemistry, Center of Innovation in Personalized Medicine, King Fahd Center for Medical Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
36
|
Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res 2016; 44:e89. [PMID: 26893356 PMCID: PMC4872089 DOI: 10.1093/nar/gkw092] [Citation(s) in RCA: 386] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 02/01/2016] [Accepted: 02/06/2016] [Indexed: 11/15/2022] Open
Abstract
Annotation of protein-coding genes is very important in bioinformatics and biology and has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about protein-coding genes from an annotated organism to an organism of interest.Here, we present a homology-based gene prediction program called GeMoMa. GeMoMa utilizes the conservation of intron positions within genes to predict related genes in other organisms. We assess the performance of GeMoMa and compare it with state-of-the-art competitors on plant and animal genomes using an extended best reciprocal hit approach. We find that GeMoMa often makes more precise predictions than its competitors yielding a substantially increased number of correct transcripts. Subsequently, we exemplarily validate GeMoMa predictions using Sanger sequencing. Finally, we use RNA-seq data to compare the predictions of homology-based gene prediction programs, and find again that GeMoMa performs well.Hence, we conclude that exploiting intron position conservation improves homology-based gene prediction, and we make GeMoMa freely available as command-line tool and Galaxy integration.
Collapse
Affiliation(s)
- Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany
| | - Michael Wenk
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, D-06120 Halle (Saale), Germany
| | - Jessica L Erickson
- Schattat Lab, Institute of Biology, Martin Luther University Halle-Wittenberg, D-06120 Halle (Saale), Germany
| | - Martin H Schattat
- Schattat Lab, Institute of Biology, Martin Luther University Halle-Wittenberg, D-06120 Halle (Saale), Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, D-06120 Halle (Saale), Germany
| | - Frank Hartung
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany
| |
Collapse
|