1
|
Shabbir M, Mithani A. Roast: a tool for reference-free optimization of supertranscriptome assemblies. BMC Bioinformatics 2024; 25:2. [PMID: 38166712 PMCID: PMC10763045 DOI: 10.1186/s12859-023-05614-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a supertranscriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo supertranscriptome assembly tools generate assemblies containing contigs that are partially-assembled, fragmented, false chimeras or have local mis-assemblies leading to decreased assembly accuracy. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms. RESULTS We present ROAST, a tool for optimization of supertranscriptome assemblies that uses paired-end RNA-seq data from Illumina sequencing platform to iteratively identify and fix assembly errors solely using the error signatures generated by RNA-seq alignment tools including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig to identify and fix various supertranscriptome assembly errors without performing BLAST searches against other organisms. Evaluation results using simulated as well as real datasets show that ROAST significantly improves assembly quality by identifying and fixing various assembly errors. CONCLUSION ROAST provides a reference-free approach to optimizing supertranscriptome assemblies highlighting its utility in refining de novo supertranscriptome assemblies of non-model organisms.
Collapse
Affiliation(s)
- Madiha Shabbir
- Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan
| | - Aziz Mithani
- Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan.
| |
Collapse
|
2
|
Kulkarni CC, Cholin SS, Bajpai AK, Ondrasek G, Mesta RK, Rathod S, Patil HB. Comparative Root Transcriptome Profiling and Gene Regulatory Network Analysis between Eastern and Western Carrot ( Daucus carota L.) Cultivars Reveals Candidate Genes for Vascular Tissue Patterning. PLANTS (BASEL, SWITZERLAND) 2023; 12:3449. [PMID: 37836190 PMCID: PMC10575051 DOI: 10.3390/plants12193449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 09/26/2023] [Accepted: 09/28/2023] [Indexed: 10/15/2023]
Abstract
Carrot (Daucus carota L.) is a highly consumed vegetable rich in carotenoids, known for their potent antioxidant, anti-inflammatory, and immune-protecting properties. While genetic and molecular studies have largely focused on wild and Western carrot cultivars (cvs), little is known about the evolutionary interactions between closely related Eastern and Western cvs. In this study, we conducted comparative transcriptome profiling of root tissues from Eastern (UHSBC-23-1) and Western (UHSBC-100) carrot cv. to better understand differentially expressed genes (DEGs) associated with storage root development and vascular cambium (VC) tissue patterning. Through reference-guided TopHat mapping, we achieved an average mapping rate of 73.87% and identified a total of 3544 DEGs (p < 0.05). Functional annotation and gene ontology classification revealed 97 functional categories, including 33 biological processes, 19 cellular components, 45 metabolic processes, and 26 KEGG pathways. Notably, Eastern cv. exhibited enrichment in cell wall, plant-pathogen interaction, and signal transduction terms, while Western cv. showed dominance in photosynthesis, metabolic process, and carbon metabolism terms. Moreover, constructed gene regulatory network (GRN) for both cvs. obtained orthologs with 1222 VC-responsive genes of Arabidopsis thaliana. In Western cv, GRN revealed VC-responsive gene clusters primarily associated with photosynthetic processes and carbon metabolism. In contrast, Eastern cv. exhibited a higher number of stress-responsive genes, and transcription factors (e.g., MYB15, WRKY46, AP2/ERF TF connected via signaling pathways with NAC036) were identified as master regulators of xylem vessel differentiation and secondary cell wall thickening. By elucidating the comparative transcriptome profiles of Eastern and Western cvs. for the first time, our study provides valuable insights into the differentially expressed genes involved in root development and VC tissue patterning. The identification of key regulatory genes and their roles in these processes represents a significant advancement in our understanding of the evolutionary relations and molecular mechanisms underlying secondary growth of carrot and regulation by vascular cambium.
Collapse
Affiliation(s)
- Chaitra C. Kulkarni
- Plant Molecular Biology Lab (DBT-BIOCARe), Department of Biotechnology & Crop Improvement, College of Horticulture, University of Horticultural Sciences, Bagalkot 587103, Karnataka, India;
- Kittur Rani Chennamma College of Horticulture, Arabhavi, Gokak 591218, Belgaum Dt., Karnataka, India
- University of Horticultural Sciences, Bagalkot 587103, Karnataka, India
| | - Sarvamangala S. Cholin
- Plant Molecular Biology Lab (DBT-BIOCARe), Department of Biotechnology & Crop Improvement, College of Horticulture, University of Horticultural Sciences, Bagalkot 587103, Karnataka, India;
- University of Horticultural Sciences, Bagalkot 587103, Karnataka, India
| | - Akhilesh K. Bajpai
- Shodhaka Life Sciences Pvt. Ltd., Electronic City, Phase-I, Bengaluru 560100, Karnataka, India
| | - Gabrijel Ondrasek
- Department of Soil Amelioration, Faculty of Agriculture, University of Zagreb, 10000 Zagreb, Croatia
| | - R. K. Mesta
- University of Horticultural Sciences, Bagalkot 587103, Karnataka, India
| | - Santosha Rathod
- Indian Institute of Rice Research, Hyderabad 500030, Telangana, India
| | - H. B. Patil
- University of Horticultural Sciences, Bagalkot 587103, Karnataka, India
| |
Collapse
|
3
|
Lin Z, Qin Y, Chen H, Shi D, Zhong M, An T, Chen L, Wang Y, Lin F, Li G, Ji ZL. TransIntegrator: capture nearly full protein-coding transcript variants via integrating Illumina and PacBio transcriptomes. Brief Bioinform 2023; 24:bbad334. [PMID: 37779246 DOI: 10.1093/bib/bbad334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/23/2023] [Accepted: 08/30/2023] [Indexed: 10/03/2023] Open
Abstract
Genes have the ability to produce transcript variants that perform specific cellular functions. However, accurately detecting all transcript variants remains a long-standing challenge, especially when working with poorly annotated genomes or without a known genome. To address this issue, we have developed a new computational method, TransIntegrator, which enables transcriptome-wide detection of novel transcript variants. For this, we determined 10 Illumina sequencing transcriptomes and a PacBio full-length transcriptome for consecutive embryo development stages of amphioxus, a species of great evolutionary importance. Based on the transcriptomes, we employed TransIntegrator to create a comprehensive transcript variant library, namely iTranscriptome. The resulting iTrancriptome contained 91 915 distinct transcript variants, with an average of 2.4 variants per gene. This substantially improved current amphioxus genome annotation by expanding the number of genes from 21 954 to 38 777. Further analysis manifested that the gene expansion was largely ascribed to integration of multiple Illumina datasets instead of involving the PacBio data. Moreover, we demonstrated an example application of TransIntegrator, via generating iTrancriptome, in aiding accurate transcriptome assembly, which significantly outperformed other hybrid methods such as IDP-denovo and Trinity. For user convenience, we have deposited the source codes of TransIntegrator on GitHub as well as a conda package in Anaconda. In summary, this study proposes an affordable but efficient method for reliable transcriptomic research in most species.
Collapse
Affiliation(s)
- Zhe Lin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China
| | - Yangmei Qin
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Hao Chen
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Dan Shi
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Mindong Zhong
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Te An
- School of Informatics, Xiamen University, 361005, Xiamen, China
| | - Linshan Chen
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Yiquan Wang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Fan Lin
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China
- School of Informatics, Xiamen University, 361005, Xiamen, China
| | - Guang Li
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
| | - Zhi-Liang Ji
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, 361102, Xiamen, China
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102, Xiamen, China
| |
Collapse
|
4
|
Ahmadi H, Sheikh-Assadi M, Fatahi R, Zamani Z, Shokrpour M. Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis. Sci Rep 2023; 13:12415. [PMID: 37524806 PMCID: PMC10390528 DOI: 10.1038/s41598-023-39620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/27/2023] [Indexed: 08/02/2023] Open
Abstract
Non-erroneous and well-optimized transcriptome assembly is a crucial prerequisite for authentic downstream analyses. Each de novo assembler has its own algorithm-dependent pros and cons to handle the assembly issues and should be specifically tested for each dataset. Here, we examined efficiency of seven state-of-art assemblers on ~ 30 Gb data obtained from mRNA-sequencing of Thymus daenensis. In an ensemble workflow, combining the outputs of different assemblers associated with an additional redundancy-reducing step could generate an optimized outcome in terms of completeness, annotatability, and ORF richness. Based on the normalized scores of 16 benchmarking metrics, EvidentialGene, BinPacker, Trinity, rnaSPAdes, CAP3, IDBA-trans, and Velvet-Oases performed better, respectively. EvidentialGene, as the best assembler, totally produced 316,786 transcripts, of which 235,730 (74%) were predicted to have a unique protein hit (on uniref100), and also half of its transcripts contained an ORF. The total number of unique BLAST hits for EvidentialGene was approximately three times greater than that of the worst assembler (Velvet-Oases). EvidentialGene could even capture 17% and 7% more average BLAST hits than BinPacker and Trinity. Although BinPacker and CAP3 produced longer transcripts, the EvidentialGene showed a higher collinearity between transcript size and ORF length. Compared with the other programs, EvidentialGene yielded a higher number of optimal transcript sets, further full-length transcripts, and lower possible misassemblies. Our finding corroborates that in non-model species, relying on a single assembler may not give an entirely satisfactory result. Therefore, this study proposes an ensemble approach of accompanying EvidentialGene pipelines to acquire a superior assembly for T. daenensis.
Collapse
Affiliation(s)
- Hosein Ahmadi
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| | - Morteza Sheikh-Assadi
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| | - Reza Fatahi
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran.
| | - Zabihollah Zamani
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| | - Majid Shokrpour
- Department of Horticulture Science, Faculty of Agriculture and Natural Sciences, University of Tehran, Karaj, Iran
| |
Collapse
|
5
|
Vuruputoor VS, Monyak D, Fetter KC, Webster C, Bhattarai A, Shrestha B, Zaman S, Bennett J, McEvoy SL, Caballero M, Wegrzyn JL. Welcome to the big leaves: Best practices for improving genome annotation in non-model plant genomes. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11533. [PMID: 37601314 PMCID: PMC10439824 DOI: 10.1002/aps3.11533] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 02/04/2023] [Accepted: 02/10/2023] [Indexed: 08/22/2023]
Abstract
Premise Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein-coding gene predictions. Methods The impact of repeat masking, long-read and short-read inputs, and de novo and genome-guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. Results Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono-exonic/multi-exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA-read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence-based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome-guided transcriptome assemblies, or full-length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post-processing with functional and structural filters is highly recommended. Discussion While the annotation of non-model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.
Collapse
Affiliation(s)
- Vidya S. Vuruputoor
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Daniel Monyak
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Karl C. Fetter
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Cynthia Webster
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Akriti Bhattarai
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Bikash Shrestha
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Sumaira Zaman
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Jeremy Bennett
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Susan L. McEvoy
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Madison Caballero
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Jill L. Wegrzyn
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| |
Collapse
|
6
|
Alkaloid production and response to natural adverse conditions in Peganum harmala: in silico transcriptome analyses. BIOTECHNOLOGIA 2022; 103:355-384. [PMID: 36685700 PMCID: PMC9837557 DOI: 10.5114/bta.2022.120706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 07/25/2022] [Accepted: 09/16/2022] [Indexed: 01/06/2023] Open
Abstract
Peganum harmala is a valuable wild plant that grows and survives under adverse conditions and produces pharmaceutical alkaloid metabolites. Using different assemblers to develop a transcriptome improves the quality of assembled transcriptome. In this study, a concrete and accurate method for detecting stress-responsive transcripts by comparing stress-related gene ontology (GO) terms and public domains was designed. An integrated transcriptome for P. harmala including 42 656 coding sequences was created by merging de novo assembled transcriptomes. Around 35 000 transcripts were annotated with more than 90% resemblance to three closely related species of Citrus, which confirmed the robustness of the assembled transcriptome; 4853 stress-responsive transcripts were identified. CYP82 involved in alkaloid biosynthesis showed a higher number of transcripts in P. harmala than in other plants, indicating its diverse alkaloid biosynthesis attributes. Transcription factors (TFs) and regulatory elements with 3887 transcripts comprised 9% of the transcriptome. Among the TFs of the integrated transcriptome, cystein2/histidine2 (C2H2) and WD40 repeat families were the most abundant. The Kyoto Encyclopedia of Genes and Genomes (KEGG) MAPK (mitogen-activated protein kinase) signaling map and the plant hormone signal transduction map showed the highest assigned genes to these pathways, suggesting their potential stress resistance. The P. harmala whole-transcriptome survey provides important resources and paves the way for functional and comparative genomic studies on this plant to discover stress-tolerance-related markers and response mechanisms in stress physiology, phytochemistry, ecology, biodiversity, and evolution. P. harmala can be a potential model for studying adverse environmental cues and metabolite biosynthesis and a major source for the production of various alkaloids.
Collapse
|
7
|
Wang H, Qu M, Tang W, Liu S, Ding S. Transcriptome Profiling and Expression Localization of Key Sex-Related Genes in a Socially-Controlled Hermaphroditic Clownfish, Amphiprion clarkii. Int J Mol Sci 2022; 23:ijms23169085. [PMID: 36012348 PMCID: PMC9409170 DOI: 10.3390/ijms23169085] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/03/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022] Open
Abstract
Clownfish can be an excellent research model for investigating the socially-controlled sexual development of sequential hermaphrodite teleosts. However, the molecular cascades underlying the social cues that orchestrate the sexual development process remain poorly understood. Here, we performed a comparative transcriptomic analysis of gonads from females, males, and nonbreeders of Amphiprion clarkii, which constitute a complete social group, allowing us to investigate the molecular regulatory network under social control. Our analysis highlighted that the gonads of nonbreeders and males exhibited high similarities but were far from females, both in global transcriptomic profiles and histological characteristics, and identified numerous candidate genes involved in sexual development, some well-known and some novel. Significant upregulation of cyp19a1a, foxl2, nr5a1a, wnt4a, hsd3b7, and pgr in females provides strong evidence for the importance of steroidogenesis in ovarian development and maintenance, with cyp19a1a playing a central role. Amh and sox8 are two potential key factors that may regulate testicular tissue development in early and late stages, respectively, as they are expressed at higher levels in males than in females, but with slightly different expression timings. Unlike previous descriptions in other fishes, the unique expression pattern of dmrt1 in A. clarkii implied its potential function in both male and female gonads, and we speculated that it might play promoting roles in the early development of both testicular and ovarian tissues.
Collapse
Affiliation(s)
- Huan Wang
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
| | - Meng Qu
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
- CAS Key Laboratory of Tropical Marine Bio-Resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Southern Marine Science and Engineering Guangdong Laboratory (GML, Guangzhou), Guangzhou 511458, China
| | - Wei Tang
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
| | - Shufang Liu
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao 266071, China
- Correspondence: (S.L.); (S.D.)
| | - Shaoxiong Ding
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
- Correspondence: (S.L.); (S.D.)
| |
Collapse
|
8
|
Srivastava AK, Srivastava R, Sharma A, Bharati AP, Yadav J, Singh AK, Tiwari PK, Srivatava AK, Chakdar H, Kashyap PL, Saxena AK. Transcriptome Analysis to Understand Salt Stress Regulation Mechanism of Chromohalobacter salexigens ANJ207. Front Microbiol 2022; 13:909276. [PMID: 35847097 PMCID: PMC9279137 DOI: 10.3389/fmicb.2022.909276] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 06/08/2022] [Indexed: 11/21/2022] Open
Abstract
Soil salinity is one of the major global issues affecting soil quality and agricultural productivity. The plant growth-promoting halophilic bacteria that can thrive in regions of high salt (NaCl) concentration have the ability to promote the growth of plants in salty environments. In this study, attempts have been made to understand the salinity adaptation of plant growth-promoting moderately halophilic bacteria Chromohalobacter salexigens ANJ207 at the genetic level through transcriptome analysis. In order to identify the stress-responsive genes, the transcriptome sequencing of C. salexigens ANJ207 under different salt concentrations was carried out. Among the 8,936 transcripts obtained, 93 were upregulated while 1,149 were downregulated when the NaCl concentration was increased from 5 to 10%. At 10% NaCl concentration, genes coding for lactate dehydrogenase, catalase, and OsmC-like protein were upregulated. On the other hand, when salinity was increased from 10 to 25%, 1,954 genes were upregulated, while 1,287 were downregulated. At 25% NaCl, genes coding for PNPase, potassium transporter, aconitase, excinuclease subunit ABC, and transposase were found to be upregulated. The quantitative real-time PCR analysis showed an increase in the transcript of genes related to the biosynthesis of glycine betaine coline genes (gbcA, gbcB, and L-pro) and in the transcript of genes related to the uptake of glycine betaine (OpuAC, OpuAA, and OpuAB). The transcription of the genes involved in the biosynthesis of L-hydroxyproline (proD and proS) and one stress response proteolysis gene for periplasmic membrane stress sensing (serP) were also found to be increased. The presence of genes for various compatible solutes and their increase in expression at the high salt concentration indicated that a coordinated contribution by various compatible solutes might be responsible for salinity adaptation in ANJ207. The investigation provides new insights into the functional roles of various genes involved in salt stress tolerance and oxidative stress tolerance produced by high salt concentration in ANJ207 and further support the notion regarding the utilization of bacterium and their gene(s) in ameliorating salinity problem in agriculture.
Collapse
Affiliation(s)
- Alok Kumar Srivastava
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Ruchi Srivastava
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Anjney Sharma
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Akhilendra Pratap Bharati
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India.,Department of Life Science and Biotechnology, Chhatrapati Shahu Ji Maharaj University, Kanpur, India
| | - Jagriti Yadav
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Alok Kumar Singh
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Praveen Kumar Tiwari
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Anchal Kumar Srivatava
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Hillol Chakdar
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| | - Prem Lal Kashyap
- Indian Council of Agricultural Research-Indian Institute of Wheat and Barley Research, Karnal, India
| | - Anil Kumar Saxena
- Indian Council of Agricultural Research-National Bureau of Agriculturally Important Microorganisms, Mau, India
| |
Collapse
|
9
|
Yang L, Wang H, Wang P, Gao M, Huang L, Cui X, Liu Y. De novo and comparative transcriptomic analysis explain morphological differences in Panax notoginseng taproots. BMC Genomics 2022; 23:86. [PMID: 35100996 PMCID: PMC8802446 DOI: 10.1186/s12864-021-08283-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 12/28/2021] [Indexed: 12/20/2022] Open
Abstract
Background Panax notoginseng (Burk.) F. H. Chen (PN) belonging to the genus Panax of family Araliaceae is widely used in traditional Chinese medicine to treat various diseases. PN taproot, as the most vital organ for the accumulation of bioactive components, presents a variable morphology (oval or long), even within the same environment. However, no related studies have yet explained the molecular mechanism of phenotypic differences. To investigate the cause of differences in the taproot phenotype, de novo and comparative transcriptomic analysis on PN taproot was performed. Results A total of 133,730,886 and 114,761,595 paired-end clean reads were obtained based on high-throughput sequencing from oval and long taproot samples, respectively. 121,955 unigenes with contig N50 = 1,774 bp were generated by using the de novo assembly transcriptome, 63,133 annotations were obtained with the BLAST. And then, 42 genes belong to class III peroxidase (PRX) gene family, 8 genes belong to L-Ascorbate peroxidase (APX) gene family, and 55 genes belong to a series of mitogen-activated protein kinase (MAPK) gene family were identified based on integrated annotation results. Differentially expressed genes analysis indicated substantial up-regulation of PnAPX3 and PnPRX45, which are related to reactive oxygen species metabolism, and the PnMPK3 gene, which is related to cell proliferation and plant root development, in long taproots compared with that in oval taproots. Furthermore, the determination results of real-time quantitative PCR, enzyme activity, and H2O2 content verified transcriptomic analysis results. Conclusion These results collectively demonstrate that reactive oxygen species (ROS) metabolism and the PnMPK3 gene may play vital roles in regulating the taproot phenotype of PN. This study provides further insights into the genetic mechanisms of phenotypic differences in other species of the genus Panax. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08283-w.
Collapse
Affiliation(s)
- Lifang Yang
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650000, China
| | - Hanye Wang
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650000, China
| | - Panpan Wang
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650000, China
| | - Mingju Gao
- Wenshan University, Wenshan, 663000, China
| | - Luqi Huang
- National Resource Center for Chinese Materia Medica, Chinese Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Xiuming Cui
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650000, China.,Key Laboratory of Panax notoginseng Resources Sustainable Development and Utilization of State Administration of Traditional Chinese Medicine, Kunming, 650000, China.,Yunnan Provincial Key Laboratory of Panax notoginseng, Kunming, 650000, China.,Kunming Key Laboratory of Sustainable Development and Utilization of Famous-Region Drug, Kunming, 650000, China.,Sanqi Research Institute of Yunnan Province, Kunming, 650000, China
| | - Yuan Liu
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, 650000, China. .,Key Laboratory of Panax notoginseng Resources Sustainable Development and Utilization of State Administration of Traditional Chinese Medicine, Kunming, 650000, China. .,Yunnan Provincial Key Laboratory of Panax notoginseng, Kunming, 650000, China. .,Kunming Key Laboratory of Sustainable Development and Utilization of Famous-Region Drug, Kunming, 650000, China. .,Sanqi Research Institute of Yunnan Province, Kunming, 650000, China.
| |
Collapse
|
10
|
CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure. PLoS Comput Biol 2021; 17:e1009631. [PMID: 34813594 PMCID: PMC8651127 DOI: 10.1371/journal.pcbi.1009631] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 12/07/2021] [Accepted: 11/11/2021] [Indexed: 11/19/2022] Open
Abstract
With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/. Within transcriptome reference sets, non-chimeric sequences are representations of transcribed genes, while artificially generated chimeric ones are mosaics of two or more pieces of DNA incorrectly pieced together. One area where such sets are utilized is in the quantification of gene expression patterns; where RNA-Seq reads are mapped to the sequences within, and subsequent count values reflect expression levels. Artificial chimeras can have a negative impact on count values by erroneously increasing variation in relation to the reads being mapped. Reference sets can be created from de novo assembled contigs, but chimeras can be introduced during the assembly process via the required traversal of graphs, representing gene families, constructed from the RNA-Seq data. Graph complexity determines how likely chimeras will arise. We have created CStone, a de novo assembler that utilizes a classification system to describe such complexity. Contigs created by CStone are labelled in a manner that indicates whether or not they are non-chimeric. This encourages contig dependent results to be presented with increased objectivity by maintaining the context of ambiguity associated with the assembly process. CStone has been tested extensively. Additionally, we have quantified the relationship between chimeras within reference sets and the identification of differentially expressed genes.
Collapse
|
11
|
Lee SG, Na D, Park C. Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level. BMC Bioinformatics 2021; 22:310. [PMID: 34674628 PMCID: PMC8529712 DOI: 10.1186/s12859-021-04226-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented. Results In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method. Conclusions Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04226-0.
Collapse
Affiliation(s)
- Sung-Gwon Lee
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, 06974, Republic of Korea
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
12
|
Voshall A, Behera S, Li X, Yu XH, Kapil K, Deogun JS, Shanklin J, Cahoon EB, Moriyama EN. A consensus-based ensemble approach to improve transcriptome assembly. BMC Bioinformatics 2021; 22:513. [PMID: 34674629 PMCID: PMC8532302 DOI: 10.1186/s12859-021-04434-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 10/10/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. RESULTS In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. CONCLUSIONS Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/ .
Collapse
Affiliation(s)
- Adam Voshall
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital/Harvard Medical School, Boston, MA, 02115, USA
| | - Sairam Behera
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Xiangjun Li
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Xiao-Hong Yu
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Kushagra Kapil
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - John Shanklin
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Edgar B Cahoon
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA. .,Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
| |
Collapse
|
13
|
Comprehensive Characterization of Multitissue Expression Landscape, Co-Expression Networks and Positive Selection in Pikeperch. Cells 2021; 10:cells10092289. [PMID: 34571938 PMCID: PMC8471114 DOI: 10.3390/cells10092289] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 08/27/2021] [Accepted: 08/29/2021] [Indexed: 11/19/2022] Open
Abstract
Promising efforts are ongoing to extend genomics resources for pikeperch (Sander lucioperca), a species of high interest for the sustainable European aquaculture sector. Although previous work, including reference genome assembly, transcriptome sequence, and single-nucleotide polymorphism genotyping, added a great wealth of genomic tools, a comprehensive characterization of gene expression across major tissues in pikeperch still remains an unmet research need. Here, we used deep RNA-Sequencing of ten vital tissues collected in eight animals to build a high-confident and annotated trancriptome atlas, to detect the tissue-specificity of gene expression and co-expression network modules, and to investigate genome-wide selective signatures in the Percidae fish family. Pathway enrichment and protein–protein interaction network analyses were performed to characterize the unique biological functions of tissue-specific genes and co-expression modules. We detected strong functional correlations and similarities of tissues with respect to their expression patterns—but also significant differences in the complexity and composition of their transcriptomes. Moreover, functional analyses revealed that tissue-specific genes essentially play key roles in the specific physiological functions of the respective tissues. Identified network modules were also functionally coherent with tissues’ main physiological functions. Although tissue specificity was not associated with positive selection, several genes under selection were found to be involved in hypoxia, immunity, and gene regulation processes, that are crucial for fish adaption and welfare. Overall, these new resources and insights will not only enhance the understanding of mechanisms of organ biology in pikeperch, but also complement the amount of genomic resources for this commercial species.
Collapse
|
14
|
Testone G, Sobolev AP, Mele G, Nicolodi C, Gonnella M, Arnesi G, Biancari T, Giannino D. Leaf nutrient content and transcriptomic analyses of endive (Cichorium endivia) stressed by downpour-induced waterlog reveal a gene network regulating kestose and inulin contents. HORTICULTURE RESEARCH 2021; 8:92. [PMID: 33931617 PMCID: PMC8087766 DOI: 10.1038/s41438-021-00513-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 02/08/2021] [Accepted: 02/24/2021] [Indexed: 05/03/2023]
Abstract
Endive (Cichorium endivia L.), a vegetable consumed as fresh or packaged salads, is mostly cultivated outdoors and known to be sensitive to waterlogging in terms of yield and quality. Phenotypic, metabolic and transcriptomic analyses were used to study variations in curly- ('Domari', 'Myrna') and smooth-leafed ('Flester', 'Confiance') cultivars grown in short-term waterlog due to rainfall excess before harvest. After recording loss of head weights in all cultivars (6-35%), which was minimal in 'Flester', NMR untargeted profiling revealed variations as influenced by genotype, environment and interactions, and included drop of total carbohydrates (6-50%) and polyols (3-37%), gain of organic acids (2-30%) and phenylpropanoids (98-560%), and cultivar-specific fluctuations of amino acids (-37 to +15%). The analysis of differentially expressed genes showed GO term enrichment consistent with waterlog stress and included the carbohydrate metabolic process. The loss of sucrose, kestose and inulin recurred in all cultivars and the sucrose-inulin route was investigated by covering over 50 genes of sucrose branch and key inulin synthesis (fructosyltransferases) and catabolism (fructan exohydrolases) genes. The lowered expression of a sucrose gene subset together with that of SUCROSE:SUCROSE-1-FRUCTOSYLTRANSFERASE (1-SST) may have accounted for sucrose and kestose contents drop in the leaves of waterlogged plants. Two anti-correlated modules harbouring candidate hub-genes, including 1-SST, were identified by weighted gene correlation network analysis, and proposed to control positively and negatively kestose levels. In silico analysis further pointed at transcription factors of GATA, DOF, WRKY types as putative regulators of 1-SST.
Collapse
Affiliation(s)
- Giulio Testone
- Institute for Biological Systems, National Research Council (CNR), Via Salaria Km 29,300 - 00015 Monterotondo, Rome, Italy
| | - Anatoly Petrovich Sobolev
- Institute for Biological Systems, National Research Council (CNR), Via Salaria Km 29,300 - 00015 Monterotondo, Rome, Italy
| | - Giovanni Mele
- Institute for Biological Systems, National Research Council (CNR), Via Salaria Km 29,300 - 00015 Monterotondo, Rome, Italy
| | - Chiara Nicolodi
- Institute for Biological Systems, National Research Council (CNR), Via Salaria Km 29,300 - 00015 Monterotondo, Rome, Italy
| | - Maria Gonnella
- Institute of Sciences of Food Production, CNR. Via G. Amendola 122/O - 70126, Bari, Italy
| | - Giuseppe Arnesi
- Enza Zaden Italia, Strada Statale Aurelia km. 96.400 - 01016 Tarquinia, Viterbo, Italy
| | - Tiziano Biancari
- Enza Zaden Italia, Strada Statale Aurelia km. 96.400 - 01016 Tarquinia, Viterbo, Italy
| | - Donato Giannino
- Institute for Biological Systems, National Research Council (CNR), Via Salaria Km 29,300 - 00015 Monterotondo, Rome, Italy.
| |
Collapse
|
15
|
Behera S, Voshall A, Moriyama EN. Plant Transcriptome Assembly: Review and Benchmarking. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
16
|
Quick and efficient approach to develop genomic resources in orphan species: Application in Lavandula angustifolia. PLoS One 2020; 15:e0243853. [PMID: 33306734 PMCID: PMC7732122 DOI: 10.1371/journal.pone.0243853] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 11/27/2020] [Indexed: 12/24/2022] Open
Abstract
Next-Generation Sequencing (NGS) technologies, by reducing the cost and increasing the throughput of sequencing, have opened doors to generate genomic data in a range of previously poorly studied species. In this study, we propose a method for the rapid development of a large-scale molecular resources for orphan species. We studied as an example the true lavender (Lavandula angustifolia Mill.), a perennial sub-shrub plant native from the Mediterranean region and whose essential oil have numerous applications in cosmetics, pharmaceuticals, and alternative medicines. The heterozygous clone “Maillette” was used as a reference for DNA and RNA sequencing. We first built a reference Unigene, compound of coding sequences, thanks to de novo RNA-seq assembly. Then, we reconstructed the complete genes sequences (with introns and exons) using an Unigene-guided DNA-seq assembly approach. This aimed to maximize the possibilities of finding polymorphism between genetically close individuals despite the lack of a reference genome. Finally, we used these resources for SNP mining within a collection of 16 commercial lavender clones and tested the SNP within the scope of a genetic distance analysis. We obtained a cleaned reference of 8, 030 functionally in silico annotated genes. We found 359K polymorphic sites and observed a high SNP frequency (mean of 1 SNP per 90 bp) and a high level of heterozygosity (more than 60% of heterozygous SNP per genotype). On overall, we found similar genetic distances between pairs of clones, which is probably related to the out-crossing nature of the species and the restricted area of cultivation. The proposed method is transferable to other orphan species, requires little bioinformatics resources and can be realized within a year. This is also the first reported large-scale SNP development on Lavandula angustifolia. All the genomics resources developed herein are publicly available and provide a rich pool of molecular resources to explore and exploit lavender genetic diversity in breeding programs.
Collapse
|
17
|
Cogne Y, Gouveia D, Chaumot A, Degli-Esposti D, Geffard O, Pible O, Almunia C, Armengaud J. Proteogenomics-Guided Evaluation of RNA-Seq Assembly and Protein Database Construction for Emergent Model Organisms. Proteomics 2020; 20:e1900261. [PMID: 32249536 DOI: 10.1002/pmic.201900261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 03/24/2020] [Indexed: 11/10/2022]
Abstract
Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.
Collapse
Affiliation(s)
- Yannick Cogne
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Duarte Gouveia
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Arnaud Chaumot
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Davide Degli-Esposti
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Olivier Geffard
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Olivier Pible
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Christine Almunia
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Jean Armengaud
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| |
Collapse
|
18
|
Wimberley J, Cahill J, Atamian HS. De novo Sequencing and Analysis of Salvia hispanica Tissue-Specific Transcriptome and Identification of Genes Involved in Terpenoid Biosynthesis. PLANTS 2020; 9:plants9030405. [PMID: 32213996 PMCID: PMC7154873 DOI: 10.3390/plants9030405] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 03/16/2020] [Accepted: 03/19/2020] [Indexed: 12/11/2022]
Abstract
Salvia hispanica (commonly known as chia) is gaining popularity worldwide as a healthy food supplement due to its low saturated fatty acid and high polyunsaturated fatty acid content, in addition to being rich in protein, fiber, and antioxidants. Chia leaves contain plethora of secondary metabolites with medicinal properties. In this study, we sequenced chia leaf and root transcriptomes using the Illumina platform. The short reads were assembled into contigs using the Trinity software and annotated against the Uniprot database. The reads were de novo assembled into 103,367 contigs, which represented 92.8% transcriptome completeness and a diverse set of Gene Ontology terms. Differential expression analysis identified 6151 and 8116 contigs significantly upregulated in the leaf and root tissues, respectively. In addition, we identified 30 contigs belonging to the Terpene synthase (TPS) family and demonstrated their evolutionary relationships to tomato TPS family members. Finally, we characterized the expression of S. hispanica TPS members in leaves subjected to abiotic stresses and hormone treatments. Abscisic acid had the most pronounced effect on the expression of the TPS genes tested in this study. Our work provides valuable community resources for future studies aimed at improving and utilizing the beneficial constituents of this emerging healthy food source.
Collapse
Affiliation(s)
- James Wimberley
- Computational and Data Sciences Program, Chapman University, Orange, CA 92866, USA;
- Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA
| | | | - Hagop S. Atamian
- Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA
- Biological Sciences Program, Chapman University, Orange, CA 92866, USA
- Correspondence: ; Tel.: +1-(714)-289-2023
| |
Collapse
|
19
|
Hämälä T, Gorton AJ, Moeller DA, Tiffin P. Pleiotropy facilitates local adaptation to distant optima in common ragweed (Ambrosia artemisiifolia). PLoS Genet 2020; 16:e1008707. [PMID: 32210431 PMCID: PMC7135370 DOI: 10.1371/journal.pgen.1008707] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/06/2020] [Accepted: 03/05/2020] [Indexed: 12/23/2022] Open
Abstract
Pleiotropy, the control of multiple phenotypes by a single locus, is expected to slow the rate of adaptation by increasing the chance that beneficial alleles also have deleterious effects. However, a prediction arising from classical theory of quantitative trait evolution states that pleiotropic alleles may have a selective advantage when phenotypes are distant from their selective optima. We examine the role of pleiotropy in regulating adaptive differentiation among populations of common ragweed (Ambrosia artemisiifolia); a species that has recently expanded its North American range due to human-mediated habitat change. We employ a phenotype-free approach by using connectivity in gene networks as a proxy for pleiotropy. First, we identify loci bearing footprints of local adaptation, and then use genotype-expression mapping and co-expression networks to infer the connectivity of the genes. Our results indicate that the putatively adaptive loci are highly pleiotropic, as they are more likely than expected to affect the expression of other genes, and they reside in central positions within the gene networks. We propose that the conditionally advantageous alleles at these loci avoid the cost of pleiotropy by having large phenotypic effects that are beneficial when populations are far from their selective optima. We further use evolutionary simulations to show that these patterns are in agreement with a model where populations face novel selective pressures, as expected during a range expansion. Overall, our results suggest that highly connected genes may be targets of positive selection during environmental change, even though they likely experience strong purifying selection in stable selective environments.
Collapse
Affiliation(s)
- Tuomas Hämälä
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, United States of America
| | - Amanda J. Gorton
- Department of Ecology, Evolution and Behavior, University of Minnesota, St. Paul, Minnesota, United States of America
| | - David A. Moeller
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, United States of America
| | - Peter Tiffin
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, United States of America
| |
Collapse
|
20
|
Comparative Analysis of Strategies for De Novo Transcriptome Assembly in Prokaryotes: Streptomyces clavuligerus as a Case Study. High Throughput 2019; 8:ht8040020. [PMID: 31801255 PMCID: PMC6970227 DOI: 10.3390/ht8040020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 11/20/2019] [Accepted: 11/23/2019] [Indexed: 12/15/2022] Open
Abstract
The performance of software tools for de novo transcriptome assembly greatly depends on the selection of software parameters. Up to now, the development of de novo transcriptome assembly for prokaryotes has not been as remarkable as that for eukaryotes. In this contribution, Rockhopper2 was used to perform a comparative transcriptome analysis of Streptomyces clavuligerus exposed to diverse environmental conditions. The study focused on assessing the incidence of software parameters on software performance for the identification of differentially expressed genes as a final goal. For this, a statistical optimization was performed using the Transrate Assembly Score (TAS). TAS was also used for evaluating the software performance and for comparing it with related tools, e.g., Trinity. Transcriptome redundancy and completeness were also considered for this analysis. Rockhopper2 and Trinity reached a TAS value of 0.55092 and 0.58337, respectively. Trinity assembles transcriptomes with high redundancy, with 55.6% of transcripts having some duplicates. Additionally, we observed that the total number of differentially expressed genes (DEG) and their annotation greatly depends on the method used for removing redundancy and the tools used for transcript quantification. To our knowledge, this is the first work aimed at assessing de novo assembly software for prokaryotic organisms.
Collapse
|
21
|
Voshall A, Moriyama EN. Next-generation transcriptome assembly and analysis: Impact of ploidy. Methods 2019; 176:14-24. [PMID: 31176772 DOI: 10.1016/j.ymeth.2019.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Revised: 05/30/2019] [Accepted: 06/01/2019] [Indexed: 10/26/2022] Open
Abstract
Whole genome duplications (WGD) occur widely in plants, but the effects of these events impact all branches of life. WGD events have major evolutionary impacts, often leading to major structural changes within the chromosomes and massive changes in gene expression that facilitate rapid speciation and gene diversification. Even for species that currently have diploid genomes, the impact of ancestral duplication events is still present in the genomes, especially in the context of highly similar gene families that are retained from WGD. However, the impact of these ploidies on various bioinformatics workflows has not been studied well. In this review, we overview biological significance of polyploidy in different organisms. We describe the impact of having polyploid transcriptomes on bioinformatics analyses, especially focusing on transcriptome assembly and transcript quantification. We discuss the benefits of using simulated benchmarking data when we examine the performance of various methods. We also present an example strategy to generate simulated allopolyploid transcriptomes and RNAseq datasets and how these benchmark datasets can be used to assess the performance of transcript assembly and quantification methods. Our benchmarking study shows that all transcriptome assembly methods are affected by having polyploid genomes. Quantification accuracy is also impacted by polyploidy depending on the method. These simulated datasets can be adapted for testing, such as, read mapping, variant calling, and differential expression using biologically realistic conditions.
Collapse
Affiliation(s)
- Adam Voshall
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Etsuko N Moriyama
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
| |
Collapse
|
22
|
Minio A, Massonnet M, Figueroa-Balderas R, Vondras AM, Blanco-Ulate B, Cantu D. Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development. G3 (BETHESDA, MD.) 2019; 9:755-767. [PMID: 30642874 PMCID: PMC6404599 DOI: 10.1534/g3.118.201008] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/09/2019] [Indexed: 01/13/2023]
Abstract
Transcriptomics has been widely applied to study grape berry development. With few exceptions, transcriptomic studies in grape are performed using the available genome sequence, PN40024, as reference. However, differences in gene content among grape accessions, which contribute to phenotypic differences among cultivars, suggest that a single reference genome does not represent the species' entire gene space. Though whole genome assembly and annotation can reveal the relatively unique or "private" gene space of any particular cultivar, transcriptome reconstruction is a more rapid, less costly, and less computationally intensive strategy to accomplish the same goal. In this study, we used single molecule-real time sequencing (SMRT) to sequence full-length cDNA (Iso-Seq) and reconstruct the transcriptome of Cabernet Sauvignon berries during berry ripening. In addition, short reads from ripening berries were used to error-correct low-expression isoforms and to profile isoform expression. By comparing the annotated gene space of Cabernet Sauvignon to other grape cultivars, we demonstrate that the transcriptome reference built with Iso-Seq data represents most of the expressed genes in the grape berries and includes 1,501 cultivar-specific genes. Iso-Seq produced transcriptome profiles similar to those obtained after mapping on a complete genome reference. Together, these results justify the application of Iso-Seq to identify cultivar-specific genes and build a comprehensive reference for transcriptional profiling that circumvents the necessity of a genome reference with its associated costs and computational weight.
Collapse
Affiliation(s)
- Andrea Minio
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| | - Mélanie Massonnet
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| | | | - Amanda M Vondras
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| | | | - Dario Cantu
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| |
Collapse
|
23
|
Xia Y, Luo W, Yuan S, Zheng Y, Zeng X. Microsatellite development from genome skimming and transcriptome sequencing: comparison of strategies and lessons from frog species. BMC Genomics 2018; 19:886. [PMID: 30526480 PMCID: PMC6286531 DOI: 10.1186/s12864-018-5329-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 11/28/2018] [Indexed: 11/14/2022] Open
Abstract
Background Even though microsatellite loci frequently have been isolated using recently developed next-generation sequencing (NGS) techniques, this task is still difficult because of the subsequent polymorphism screening requires a substantial amount of time. Selecting appropriate polymorphic microsatellites is a critical issue for ecological and evolutionary studies. However, the extent to which assembly strategy, read length, sequencing depth, and library layout produce a measurable effect on microsatellite marker development remains unclear. Here, we use six frog species for genome skimming and two frog species for transcriptome sequencing to develop microsatellite markers, and investigate the effect of different isolation strategies on the yield of microsatellites. Results The results revealed that the number of isolated microsatellites increases with increased data quantity and read length. Assembly strategy could influence the yield and the polymorphism of microsatellite development. Larger k-mer sizes produced fewer total number of microsatellite loci, but these loci had a longer repeat length, suggesting greater polymorphism. However, the proportion of each type of nucleotide repeats was not affected; dinucleotide repeats were always the dominant type. Finally, the transcriptomic microsatellites displayed lower levels of polymorphisms and were less abundant than genomic microsatellites, but more likely to be functionally linked loci. Conclusions These observations provide deep insight into the evolution and distribution of microsatellites and how different isolation strategies affect microsatellite development using NGS. Electronic supplementary material The online version of this article (10.1186/s12864-018-5329-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yun Xia
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
| | - Wei Luo
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Siqi Yuan
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.,College of Bioengineering, Sichuan University of Science & Engineering, Zigong, 643000, China
| | - Yuchi Zheng
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
| | - Xiaomao Zeng
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.
| |
Collapse
|
24
|
Gubaev RF, Gorshkov VY, Gapa LM, Gogoleva NE, Vetchinkina EP, Gogolev YV. Algorithm for Physiological Interpretation of Transcriptome Profiling Data for Non-Model Organisms. Mol Biol 2018. [DOI: 10.1134/s0026893318040076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
Carruthers M, Yurchenko AA, Augley JJ, Adams CE, Herzyk P, Elmer KR. De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species. BMC Genomics 2018; 19:32. [PMID: 29310597 PMCID: PMC5759245 DOI: 10.1186/s12864-017-4379-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 12/11/2017] [Indexed: 12/21/2022] Open
Abstract
Background Salmonid fishes exhibit high levels of phenotypic and ecological variation and are thus ideal model systems for studying evolutionary processes of adaptive divergence and speciation. Furthermore, salmonids are of major interest in fisheries, aquaculture, and conservation research. Improving understanding of the genetic mechanisms underlying traits in these species would significantly progress research in these fields. Here we generate high quality de novo transcriptomes for four salmonid species: Atlantic salmon (Salmo salar), brown trout (Salmo trutta), Arctic charr (Salvelinus alpinus), and European whitefish (Coregonus lavaretus). All species except Atlantic salmon have no reference genome publicly available and few if any genomic studies to date. Results We used paired-end RNA-seq on Illumina to generate high coverage sequencing of multiple individuals, yielding between 180 and 210 M reads per species. After initial assembly, strict filtering was used to remove duplicated, redundant, and low confidence transcripts. The final assemblies consisted of 36,505 protein-coding transcripts for Atlantic salmon, 35,736 for brown trout, 33,126 for Arctic charr, and 33,697 for European whitefish and are made publicly available. Assembly completeness was assessed using three approaches, all of which supported high quality of the assemblies: 1) ~78% of Actinopterygian single-copy orthologs were successfully captured in our assemblies, 2) orthogroup inference identified high overlap in the protein sequences present across all four species (40% shared across all four and 84% shared by at least two), and 3) comparison with the published Atlantic salmon genome suggests that our assemblies represent well covered (~98%) protein-coding transcriptomes. Thorough comparison of the generated assemblies found that 84-90% of transcripts in each assembly were orthologous with at least one of the other three species. We also identified 34-37% of transcripts in each assembly as paralogs. We further compare completeness and annotation statistics of our new assemblies to available related species. Conclusion New, high-confidence protein-coding transcriptomes were generated for four ecologically and economically important species of salmonids. This offers a high quality pipeline for such complex genomes, represents a valuable contribution to the existing genomic resources for these species and provides robust tools for future investigation of gene expression and sequence evolution in these and other salmonid species. Electronic supplementary material The online version of this article (10.1186/s12864-017-4379-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Madeleine Carruthers
- Institute of Biodiversity, Animal Health & Comparative Medicine, College of Medical, Veterinary & Life Sciences, University of Glasgow, G12 8QQ, Glasgow, UK
| | - Andrey A Yurchenko
- Institute of Biodiversity, Animal Health & Comparative Medicine, College of Medical, Veterinary & Life Sciences, University of Glasgow, G12 8QQ, Glasgow, UK
| | - Julian J Augley
- Glasgow Polyomics, Wolfson Wohl Cancer Research Centre, University of Glasgow, G61 1QH, Glasgow, UK.,Present Address: Fios Genomics Ltd., Nine Edinburgh Bioquarter, 9 Little France Road, Edinburgh, EH16 4UX, UK
| | - Colin E Adams
- Institute of Biodiversity, Animal Health & Comparative Medicine, College of Medical, Veterinary & Life Sciences, University of Glasgow, G12 8QQ, Glasgow, UK.,Scottish Centre for Ecology and the Natural Environment, University of Glasgow, Rowardennan, G63 0AW, UK
| | - Pawel Herzyk
- Glasgow Polyomics, Wolfson Wohl Cancer Research Centre, University of Glasgow, G61 1QH, Glasgow, UK.,Institute of Molecular, Cell & Systems Biology, College of Medical, Veterinary & Life Sciences, University of Glasgow, G12 8QQ, Glasgow, UK
| | - Kathryn R Elmer
- Institute of Biodiversity, Animal Health & Comparative Medicine, College of Medical, Veterinary & Life Sciences, University of Glasgow, G12 8QQ, Glasgow, UK.
| |
Collapse
|
26
|
Giosa D, Felice MR, Lawrence TJ, Gulati M, Scordino F, Giuffrè L, Lo Passo C, D'Alessandro E, Criseo G, Ardell DH, Hernday AD, Nobile CJ, Romeo O. Whole RNA-Sequencing and Transcriptome Assembly of Candida albicans and Candida africana under Chlamydospore-Inducing Conditions. Genome Biol Evol 2017; 9:1971-1977. [PMID: 28810711 PMCID: PMC5553385 DOI: 10.1093/gbe/evx143] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2017] [Indexed: 12/27/2022] Open
Abstract
Candida albicans is the most common cause of life-threatening fungal infections in humans, especially in immunocompromised individuals. Crucial to its success as an opportunistic pathogen is the considerable dynamism of its genome, which readily undergoes genetic changes generating new phenotypes and shaping the evolution of new strains. Candida africana is an intriguing C. albicans biovariant strain that exhibits remarkable genetic and phenotypic differences when compared with standard C. albicans isolates. Candida africana is well-known for its low degree of virulence compared with C. albicans and for its inability to produce chlamydospores that C. albicans, characteristically, produces under certain environmental conditions. Chlamydospores are large, spherical structures, whose biological function is still unknown. For this reason, we have sequenced, assembled, and annotated the whole transcriptomes obtained from an efficient C. albicans chlamydospore-producing clinical strain (GE1), compared with the natural chlamydospore-negative C. africana clinical strain (CBS 11016). The transcriptomes of both C. albicans (GE1) and C. africana (CBS 11016) clinical strains, grown under chlamydospore-inducing conditions, were sequenced and assembled into 7,442 (GE1 strain) and 8,370 (CBS 11016 strain) high quality transcripts, respectively. The release of the first assembly of the C. africana transcriptome will allow future comparative studies to better understand the biology and evolution of this important human fungal pathogen.
Collapse
Affiliation(s)
| | - Maria Rosa Felice
- Department of Chemical, Biological, Pharmaceutical, and Environmental Sciences, University of Messina, Italy
| | - Travis J Lawrence
- Department of Molecular and Cell Biology, University of California, Merced, CA.,Quantitative and System Biology Graduate Program, University of California, Merced, CA
| | - Megha Gulati
- Department of Molecular and Cell Biology, University of California, Merced, CA
| | | | - Letterio Giuffrè
- Department of Veterinary Sciences, Division of Animal Production, University of Messina, Italy
| | - Carla Lo Passo
- Department of Chemical, Biological, Pharmaceutical, and Environmental Sciences, University of Messina, Italy
| | - Enrico D'Alessandro
- Department of Veterinary Sciences, Division of Animal Production, University of Messina, Italy
| | - Giuseppe Criseo
- Department of Chemical, Biological, Pharmaceutical, and Environmental Sciences, University of Messina, Italy
| | - David H Ardell
- Department of Molecular and Cell Biology, University of California, Merced, CA
| | - Aaron D Hernday
- Department of Molecular and Cell Biology, University of California, Merced, CA
| | - Clarissa J Nobile
- Department of Molecular and Cell Biology, University of California, Merced, CA
| | - Orazio Romeo
- IRCCS Centro Neurolesi "Bonino-Pulejo," Messina, Italy.,Department of Chemical, Biological, Pharmaceutical, and Environmental Sciences, University of Messina, Italy
| |
Collapse
|
27
|
Fantastic Beasts and How To Sequence Them: Ecological Genomics for Obscure Model Organisms. Trends Genet 2017; 34:121-132. [PMID: 29198378 DOI: 10.1016/j.tig.2017.11.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 10/30/2017] [Accepted: 11/07/2017] [Indexed: 01/05/2023]
Abstract
The application of genomic approaches to 'obscure model organisms' (OMOs), meaning species with no prior genomic resources, enables increasingly sophisticated studies of the genomic basis of evolution, acclimatization, and adaptation in real ecological contexts. I consider here ecological questions that can be addressed using OMOs, and indicate optimal sequencing and data-handling solutions for each case. With this I hope to promote the diversity of OMO-based projects that would capitalize on the peculiarities of the natural history of OMOs and could feasibly be completed within the scope of a single PhD thesis.
Collapse
|
28
|
Challenges and advances for transcriptome assembly in non-model species. PLoS One 2017; 12:e0185020. [PMID: 28931057 PMCID: PMC5607178 DOI: 10.1371/journal.pone.0185020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 09/04/2017] [Indexed: 12/28/2022] Open
Abstract
Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.
Collapse
|
29
|
Liu H, Smith TPL, Nonneman DJ, Dekkers JCM, Tuggle CK. A high-quality annotated transcriptome of swine peripheral blood. BMC Genomics 2017. [PMID: 28646867 PMCID: PMC5483264 DOI: 10.1186/s12864-017-3863-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes and/or transcriptomes. However, neither the reference genome nor the peripheral blood transcriptome of the pig have been sufficiently assembled and annotated to support such profiling assays in this emerging biomedical model organism. We aimed to assemble published and novel RNA-seq data to provide a comprehensive, well-annotated blood transcriptome for pigs by integrating a de novo assembly with a genome-guided assembly. Results A de novo and a genome-guided transcriptome of porcine whole peripheral blood was assembled with ~162 million pairs of paired-end and ~183 million single-end, trimmed and normalized Illumina RNA-seq reads (~6 billion initial reads from 146 RNA-seq libraries) from five independent studies by using the Trinity and Cufflinks software, respectively. We then removed putative transcripts (PTs) of low confidence from both assemblies and merged the remaining PTs into an integrated transcriptome consisting of 132,928 PTs, with 126,225 (~95%) PTs from the de novo assembly and more than 91% of PTs spliced. In the integrated transcriptome, ~90% and 63% of PTs had significant sequence similarity to sequences in the NCBI NT and NR databases, respectively; 68,754 (~52%) PTs were annotated with 15,965 unique gene ontology (GO) terms; and 7618 PTs annotated with Enzyme Commission codes were assigned to 134 pathways curated by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Full exon-intron junctions of 17,528 PTs were validated by PacBio IsoSeq full-length cDNA reads from 3 other porcine tissues, NCBI pig RefSeq mRNAs and transcripts from Ensembl Sscrofa10.2 annotation. Completeness of the 5’ termini of 37,569 PTs was validated by public cap analysis of gene expression (CAGE) data. By comparison to the Ensembl transcripts, we found that (1) the deduced precursors of 54,402 PTs shared at least one intron or exon with those of 18,437 Ensembl transcripts; (2) 12,262 PTs had both longer 5’ and 3’ termini than their maximally overlapping Ensembl transcripts; and (3) 41,838 spliced PTs were totally missing from the Sscrofa10.2 annotation. Similar results were obtained when the PTs were compared to the pig NCBI RefSeq mRNA collection. Conclusions We built, validated and annotated a comprehensive porcine blood transcriptome with significant improvement over the annotation of Ensembl Sscrofa10.2 and the pig NCBI RefSeq mRNAs, and laid a foundation for blood-based high throughput transcriptomic assays in pigs and for advancing annotation of the pig genome. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3863-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Haibo Liu
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, 2258 Kildee Hall, Ames, IA, 50011, USA
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA
| | - Dan J Nonneman
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE, 68933, USA
| | - Jack C M Dekkers
- Department of Animal Science, Iowa State University, 239 Kildee Hall, Ames, IA, 50011, USA
| | - Christopher K Tuggle
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA, 50011, USA.
| |
Collapse
|