1
|
Saldarriaga-Córdoba M, Clavero-León C, Rey-Suarez P, Nuñez-Rangel V, Avendaño-Herrera R, Solano-González S, Alzate JF. Unveiling Novel Kunitz- and Waprin-Type Toxins in the Micrurus mipartitus Coral Snake Venom Gland: An In Silico Transcriptome Analysis. Toxins (Basel) 2024; 16:224. [PMID: 38787076 PMCID: PMC11126030 DOI: 10.3390/toxins16050224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/23/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Kunitz-type peptide expression has been described in the venom of snakes of the Viperidae, Elapidae and Colubridae families. This work aimed to identify these peptides in the venom gland transcriptome of the coral snake Micrurus mipartitus. Transcriptomic analysis revealed a high diversity of venom-associated Kunitz serine protease inhibitor proteins (KSPIs). A total of eight copies of KSPIs were predicted and grouped into four distinctive types, including short KSPI, long KSPI, Kunitz-Waprin (Ku-WAP) proteins, and a multi-domain Kunitz-type protein. From these, one short KSPI showed high identity with Micrurus tener and Austrelaps superbus. The long KSPI group exhibited similarity within the Micrurus genus and showed homology with various elapid snakes and even with the colubrid Pantherophis guttatus. A third group suggested the presence of Kunitz domains in addition to a whey-acidic-protein-type four-disulfide core domain. Finally, the fourth group corresponded to a transcript copy with a putative 511 amino acid protein, formerly annotated as KSPI, which UniProt classified as SPINT1. In conclusion, this study showed the diversity of Kunitz-type proteins expressed in the venom gland transcriptome of M. mipartitus.
Collapse
Affiliation(s)
| | - Claudia Clavero-León
- Centro de Investigación en Recursos Naturales y Sustentabilidad (CIRENYS), Universidad Bernardo O’Higgins, Santiago 8320000, Chile
| | - Paola Rey-Suarez
- Grupo de Investigación en Toxinología, Alternativas Terapéuticas y Alimentarias, Facultad de Ciencias Farmacéuticas y Alimentarias, Universidad de Antioquia, Medellín 50010, Colombia; (P.R.-S.); (V.N.-R.)
| | - Vitelbina Nuñez-Rangel
- Grupo de Investigación en Toxinología, Alternativas Terapéuticas y Alimentarias, Facultad de Ciencias Farmacéuticas y Alimentarias, Universidad de Antioquia, Medellín 50010, Colombia; (P.R.-S.); (V.N.-R.)
- Escuela de Microbiología, Universidad de Antioquia, Medellín 50010, Colombia
| | - Ruben Avendaño-Herrera
- Facultad de Ciencias de la Vida & Centro de Investigación Marina Quintay (CIMARQ), Universidad Andrés Bello, Viña del Mar 2531015, Chile;
| | - Stefany Solano-González
- Laboratorio de Bioinformática Aplicada, Escuela de Ciencias Biológicas, Universidad Nacional, Heredia 86-3000, Costa Rica
| | - Juan F. Alzate
- Departamento de Microbiología y Parasitología, Facultad de Medicina, Universidad de Antioquia, Medellín 50010, Colombia;
| |
Collapse
|
2
|
Khelghatibana F, Javan-Nikkhah M, Safaie N, Sobhani A, Shams S, Sari E. A reference transcriptome for walnut anthracnose pathogen, Ophiognomonia leptostyla, guides the discovery of candidate virulence genes. Fungal Genet Biol 2023; 169:103828. [PMID: 37657751 DOI: 10.1016/j.fgb.2023.103828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 08/13/2023] [Accepted: 08/28/2023] [Indexed: 09/03/2023]
Abstract
Despite the economic losses due to the walnut anthracnose, Ophiognomonia leptostyla is an orphan fungus with respect to genomic resources. In the present study, the transcriptome of O. leptostyla was assembled for the first time. RNA sequencing was conducted for the fungal mycelia grown in a liquid media, and the inoculated leaf samples of walnut with the fungal conidia sampled at 48, 96 and 144 h post inoculation (hpi). The completeness, correctness, and contiguity of the de novo transcriptome assemblies generated with Trinity, Oases, SOAPdenovo-Trans and Bridger were compared to identify a single superior reference assembly. In most of the assessment criteria including N50, Transrate score, number of ORFs with known description in gene bank, the percentage of reads mapped back to the transcript (RMBT), BUSCO score, Swiss-Prot coverage bin and RESM-EVAL score, the Bridger assembly was the superior and thus used as a reference for profiling the O. leptostyla transcriptome in liquid media vs. during walnut infection. The k-means clustering of transcripts resulted in four distinct transcription patterns across the three sampling time points. Most of the detected CAZy transcripts had elevated transcription at 96 hpi that is hypothetically concurrent with the start of intracellular growth. The in-silico analysis revealed 103 candidate effectors of which six were members of Necrosis and Ethylene Inducing Like Protein (NLP) gene family belonging to three distinct k-means clusters. This study provided a complex and temporal pattern of the CAZys and candidate effectors transcription during six days post O. leptostyla inoculation on walnut leaves, introducing a list of candidate virulence genes for validation in future studies.
Collapse
Affiliation(s)
- Fatemeh Khelghatibana
- Department of Plant Pathology, Iranian Research Institute of Plant Protection, Agricultural Research, Education and Extension Organization (AREEO), Tehran, Iran.
| | - Mohammad Javan-Nikkhah
- Department of Plant Protection, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran
| | - Naser Safaie
- Department of Plant Pathology, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| | - Ahmad Sobhani
- Agricultural Biotechnology Research Institute of Iran - Isfahan Branch, Agricultural Research, Education and Extension Organization (AREEO), Isfahan, Iran
| | - Somayeh Shams
- Department of Plant Production and Genetic Engineering, Faculty of Agriculture, University of Lorestan, Khorramabad, Iran
| | - Ehsan Sari
- Department of Microbiology and Plant Pathology, University of California, Riverside, CA, USA.
| |
Collapse
|
3
|
Liu S, Koslicki D. CMash: fast, multi-resolution estimation of k-mer-based Jaccard and containment indices. Bioinformatics 2022; 38:i28-i35. [PMID: 35758788 PMCID: PMC9235470 DOI: 10.1093/bioinformatics/btac237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation K-mer-based methods are used ubiquitously in the field of computational biology. However, determining the optimal value of k for a specific application often remains heuristic. Simply reconstructing a new k-mer set with another k-mer size is computationally expensive, especially in metagenomic analysis where datasets are large. Here, we introduce a hashing-based technique that leverages a kind of bottom-m sketch as well as a k-mer ternary search tree (KTST) to obtain k-mer-based similarity estimates for a range of k values. By truncating k-mers stored in a pre-built KTST with a large k=kmax value, we can simultaneously obtain k-mer-based estimates for all k values up to kmax. This truncation approach circumvents the reconstruction of new k-mer sets when changing k values, making analysis more time and space-efficient. Results We derived the theoretical expression of the bias factor due to truncation. And we showed that the biases are negligible in practice: when using a KTST to estimate the containment index between a RefSeq-based microbial reference database and simulated metagenome data for 10 values of k, the running time was close to 10× faster compared to a classic MinHash approach while using less than one-fifth the space to store the data structure. Availability and implementation A python implementation of this method, CMash, is available at https://github.com/dkoslicki/CMash. The reproduction of all experiments presented herein can be accessed via https://github.com/KoslickiLab/CMASH-reproducibles. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shaopeng Liu
- Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA
| | - David Koslicki
- Huck Institutes of Life Sciences, Pennsylvania State University, State College, PA 16801, USA.,Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16801, USA.,Department of Biology, Pennsylvania State University, State College, PA 16801, USA
| |
Collapse
|
4
|
Improving the Annotation of the Venom Gland Transcriptome of Pamphobeteus verdolaga, Prospecting Novel Bioactive Peptides. Toxins (Basel) 2022; 14:toxins14060408. [PMID: 35737069 PMCID: PMC9228390 DOI: 10.3390/toxins14060408] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/06/2022] [Accepted: 06/07/2022] [Indexed: 02/01/2023] Open
Abstract
Spider venoms constitute a trove of novel peptides with biotechnological interest. Paucity of next-generation-sequencing (NGS) data generation has led to a description of less than 1% of these peptides. Increasing evidence supports the underestimation of the assembled genes a single transcriptome assembler can predict. Here, the transcriptome of the venom gland of the spider Pamphobeteus verdolaga was re-assembled, using three free access algorithms, Trinity, SOAPdenovo-Trans, and SPAdes, to obtain a more complete annotation. Assembler’s performance was evaluated by contig number, N50, read representation on the assembly, and BUSCO’s terms retrieval against the arthropod dataset. Out of all the assembled sequences with all software, 39.26% were common between the three assemblers, and 27.88% were uniquely assembled by Trinity, while 27.65% were uniquely assembled by SPAdes. The non-redundant merging of all three assemblies’ output permitted the annotation of 9232 sequences, which was 23% more when compared to each software and 28% more when compared to the previous P. verdolaga annotation; moreover, the description of 65 novel theraphotoxins was possible. In the generation of data for non-model organisms, as well as in the search for novel peptides with biotechnological interest, it is highly recommended to employ at least two different transcriptome assemblers.
Collapse
|
5
|
Adolfo LM, Rao X, Dixon RA. Identification of Pueraria spp. through DNA barcoding and comparative transcriptomics. BMC PLANT BIOLOGY 2022; 22:10. [PMID: 34979934 PMCID: PMC8722073 DOI: 10.1186/s12870-021-03383-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 12/05/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND Kudzu is a term used generically to describe members of the genus Pueraria. Kudzu roots have been used for centuries in traditional Chinese medicine in view of their high levels of beneficial isoflavones including the unique 8-C-glycoside of daidzein, puerarin. In the US, kudzu is seen as a noxious weed causing ecological and economic damage. However, not all kudzu species make puerarin or are equally invasive. Kudzu remains difficult to identify due to its diverse morphology and inconsistent nomenclature. RESULTS We have generated sequences for the internal transcribed spacer 2 (ITS2) and maturase K (matK) regions of Pueraria montana lobata, P. montana montana, and P. phaseoloides, and identified two accessions previously used for differential analysis of puerarin biosynthesis as P. lobata and P. phaseoloides. Additionally, we have generated root transcriptomes for the puerarin-producing P. m. lobata and the non-puerarin producing P. phaseoloides. Within the transcriptomes, microsatellites were identified to aid in species identification as well as population diversity. CONCLUSIONS The barcode sequences generated will aid in fast and efficient identification of the three kudzu species. Additionally, the microsatellites identified from the transcriptomes will aid in genetic analysis. The root transcriptomes also provide a molecular toolkit for comparative gene expression analysis towards elucidation of the biosynthesis of kudzu phytochemicals.
Collapse
Affiliation(s)
- Laci M Adolfo
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, 1155 Union Circle #305220, Denton, TX, 76203-5017, USA
| | - Xiaolan Rao
- College of Life Sciences, Hubei University, Wuhan, 430068, Hubei Province, China
| | - Richard A Dixon
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, 1155 Union Circle #305220, Denton, TX, 76203-5017, USA.
| |
Collapse
|
6
|
Lee SG, Na D, Park C. Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level. BMC Bioinformatics 2021; 22:310. [PMID: 34674628 PMCID: PMC8529712 DOI: 10.1186/s12859-021-04226-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented. Results In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method. Conclusions Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04226-0.
Collapse
Affiliation(s)
- Sung-Gwon Lee
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, 06974, Republic of Korea
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
7
|
Voshall A, Behera S, Li X, Yu XH, Kapil K, Deogun JS, Shanklin J, Cahoon EB, Moriyama EN. A consensus-based ensemble approach to improve transcriptome assembly. BMC Bioinformatics 2021; 22:513. [PMID: 34674629 PMCID: PMC8532302 DOI: 10.1186/s12859-021-04434-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 10/10/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. RESULTS In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble. CONCLUSIONS Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/ .
Collapse
Affiliation(s)
- Adam Voshall
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Pediatrics, Division of Genetics and Genomics, Boston Children's Hospital/Harvard Medical School, Boston, MA, 02115, USA
| | - Sairam Behera
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Xiangjun Li
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Xiao-Hong Yu
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Kushagra Kapil
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - John Shanklin
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Edgar B Cahoon
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Etsuko N Moriyama
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA. .,Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
| |
Collapse
|
8
|
Analysis of Gene Expression Changes in Plants Grown in Salty Soil in Response to Inoculation with Halophilic Bacteria. Int J Mol Sci 2021; 22:ijms22073611. [PMID: 33807153 PMCID: PMC8036567 DOI: 10.3390/ijms22073611] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 03/25/2021] [Accepted: 03/27/2021] [Indexed: 12/24/2022] Open
Abstract
Soil salinity is an increasing problem facing agriculture in many parts of the world. Climate change and irrigation practices have led to decreased yields of some farmland due to increased salt levels in the soil. Plants that have tolerance to salt are thus needed to feed the world's population. One approach addressing this problem is genetic engineering to introduce genes encoding salinity, but this approach has limitations. Another fairly new approach is the isolation and development of salt-tolerant (halophilic) plant-associated bacteria. These bacteria are used as inoculants to stimulate plant growth. Several reports are now available, demonstrating how the use of halophilic inoculants enhance plant growth in salty soil. However, the mechanisms for this growth stimulation are as yet not clear. Enhanced growth in response to bacterial inoculation is expected to be associated with changes in plant gene expression. In this review, we discuss the current literature and approaches for analyzing altered plant gene expression in response to inoculation with halophilic bacteria. Additionally, challenges and limitations to current approaches are analyzed. A further understanding of the molecular mechanisms involved in enhanced plant growth when inoculated with salt-tolerant bacteria will significantly improve agriculture in areas affected by saline soils.
Collapse
|
9
|
Cortese IJ, Castrillo ML, Zapata PD, Laczeski ME. EFECTO DEL FILTRADO DE SECUENCIAS EN EL ENSAMBLADO DEL GENOMA DE Bacillus altitudinis AISLADO DE Ilex paraguariensis. ACTA BIOLÓGICA COLOMBIANA 2021. [DOI: 10.15446/abc.v26n2.86406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Sin importar el tipo de tecnología aplicada para la secuenciación de un genoma, el filtrado de secuencias es un paso esencial, en el cual aquellas lecturas de baja calidad o parte de estas son eliminadas. En un ensamblado la construcción de un genoma se realiza a partir de la unión de lecturas cortas en cóntigos. Algunos ensambladores miden la relación que existe entre secuencias de una longitud fija (k-mer) que puede verse afectada por la presencia de secuencias de baja calidad. Un enfoque común para evaluar los ensamblados se basa en el análisis del número de cóntigos, la longitud del cóntigo más largo y el valor de N50, definido como la longitud del cóntigo que representa el 50 % de la longitud del conjunto. En este contexto, el presente estudio tuvo como objetivo evaluar el efecto del uso de lecturas crudas y filtradas en los valores de los parámetros de calidad obtenidos en el ensamblado del genoma de la cepa de Bacillus altitudinis19RS3 aislada de Ilex paraguariensis. Se realizó el análisis de calidad de ambos archivos de partida con el softwareFastqC y se filtraron las lecturas con el softwareTrimmomatic. Para el ensamblado se utilizó el softwareSPAdes y para su evaluación la herramienta QUAST. El mejor ensamblado para B. altitudinis19RS3 se obtuvo a partir de las lecturasfiltradas con el valor dek-mer 79, que generó 16 cóntigos mayores a 500 pb con un N50 de 931 914 pb y el cóntigo más largo de 966 271 pb.
Collapse
|
10
|
De novo RNA sequencing analysis of Aeluropus littoralis halophyte plant under salinity stress. Sci Rep 2020; 10:9148. [PMID: 32499577 PMCID: PMC7272644 DOI: 10.1038/s41598-020-65947-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 05/13/2020] [Indexed: 01/24/2023] Open
Abstract
The study of salt tolerance mechanisms in halophyte plants can provide valuable information for crop breeding and plant engineering programs. The aim of the present study was to investigate whole transcriptome analysis of Aeluropus littoralis in response to salinity stress (200 and 400 mM NaCl) by de novo RNA-sequencing. To assemble the transcriptome, Trinity v2.4.0 and Bridger tools, were comparatively used with two k-mer sizes (25 and 32 bp). The de novo assembled transcriptome by Bridger (k-mer 32) was chosen as final assembly for subsequent analysis. In general, 103290 transcripts were obtained. The differential expression analysis (log2FC > 1 and FDR < 0.01) showed that 1861 transcripts expressed differentially, including169 up and 316 down-regulated transcripts in 200 mM NaCl treatment and 1035 up and 430 down-regulated transcripts in 400 mM NaCl treatment compared to control. In addition, 89 transcripts were common in both treatments. The most important over-represented terms in the GO analysis of differentially expressed genes (FDR < 0.05) were chitin response, response to abscisic acid, and regulation of jasmonic acid mediated signaling pathway under 400 mM NaCl treatment and cell cycle, cell division, and mitotic cell cycle process under 200 mM treatment. In addition, the phosphatidylcholine biosynthetic process term was common in both salt treatments. Interestingly, under 400 mM salt treatment, the PRC1 complex that contributes to chromatin remodeling was also enriched along with vacuole as a general salinity stress responsive cell component. Among enriched pathways, the MAPK signaling pathway (ko04016) and phytohormone signal transduction (ko04075) were significantly enriched in 400 mM NaCl treatment, whereas DNA replication (ko03032) was the only pathway that significantly enriched in 200 mM NaCl treatment. Finally, our findings indicate the salt-concentration depended responses of A. littoralis, which well-known salinity stress-related pathways are induced in 400 mM NaCl, while less considered pathways, e.g. cell cycle and DNA replication, are highlighted under 200 mM NaCl treatment.
Collapse
|
11
|
Gen2EpiGUI: User-Friendly Pipeline for Analyzing Whole-Genome Sequencing Data for Epidemiological Studies of Neisseria gonorrhoeae. Sex Transm Dis 2020; 47:e42-e44. [DOI: 10.1097/olq.0000000000001206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
12
|
Cogne Y, Gouveia D, Chaumot A, Degli-Esposti D, Geffard O, Pible O, Almunia C, Armengaud J. Proteogenomics-Guided Evaluation of RNA-Seq Assembly and Protein Database Construction for Emergent Model Organisms. Proteomics 2020; 20:e1900261. [PMID: 32249536 DOI: 10.1002/pmic.201900261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 03/24/2020] [Indexed: 11/10/2022]
Abstract
Proteogenomics is gaining momentum as, today, genomics, transcriptomics, and proteomics can be readily performed on any new species. This approach allows key alterations to molecular pathways to be identified when comparing conditions. For animals and plants, RNA-seq-informed proteomics is the most popular means of interpreting tandem mass spectrometry spectra acquired for species for which the genome has not yet been sequenced. It relies on high-performance de novo RNA-seq assembly and optimized translation strategies. Here, several pre-treatments for Illumina RNA-seq reads before assembly are explored to translate the resulting contigs into useful polypeptide sequences. Experimental transcriptomics and proteomics datasets acquired for individual Gammarus fossarum freshwater crustaceans are used, the most relevant procedure is defined by the ratio of MS/MS spectra assigned to peptide sequences. Removing reads with a mean quality score of less than 17-which represents a single probable nucleotide error on 150-bp reads-prior to assembly, increases the proteomics outcome. The best translation using Transdecoder is achieved with a minimal open reading frame length of 50 amino acids and systematic selection of ORFs longer than 900 nucleotides. Using these parameters, transcriptome assembly and translation informed by proteomics pave the way to further improvements in proteogenomics.
Collapse
Affiliation(s)
- Yannick Cogne
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Duarte Gouveia
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Arnaud Chaumot
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Davide Degli-Esposti
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Olivier Geffard
- INRAE, UR RiverLY Laboratoire d'écotoxicologie, Centre de Lyon-Villeurbanne, Villeurbanne, F-69625, France
| | - Olivier Pible
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Christine Almunia
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| | - Jean Armengaud
- Université Paris Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé, SPI, 30200, Bagnols-sur-Cèze, France
| |
Collapse
|
13
|
Klein AH, Ballard KR, Storey KB, Motti CA, Zhao M, Cummins SF. Multi-omics investigations within the Phylum Mollusca, Class Gastropoda: from ecological application to breakthrough phylogenomic studies. Brief Funct Genomics 2020; 18:377-394. [PMID: 31609407 DOI: 10.1093/bfgp/elz017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Revised: 07/06/2019] [Accepted: 07/15/2019] [Indexed: 12/22/2022] Open
Abstract
Gastropods are the largest and most diverse class of mollusc and include species that are well studied within the areas of taxonomy, aquaculture, biomineralization, ecology, microbiome and health. Gastropod research has been expanding since the mid-2000s, largely due to large-scale data integration from next-generation sequencing and mass spectrometry in which transcripts, proteins and metabolites can be readily explored systematically. Correspondingly, the huge data added a great deal of complexity for data organization, visualization and interpretation. Here, we reviewed the recent advances involving gastropod omics ('gastropodomics') research from hundreds of publications and online genomics databases. By summarizing the current publicly available data, we present an insight for the design of useful data integrating tools and strategies for comparative omics studies in the future. Additionally, we discuss the future of omics applications in aquaculture, natural pharmaceutical biodiscovery and pest management, as well as to monitor the impact of environmental stressors.
Collapse
Affiliation(s)
- Anne H Klein
- Genecology Research Centre, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia
| | - Kaylene R Ballard
- Genecology Research Centre, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia
| | - Kenneth B Storey
- Institute of Biochemistry & Department of Biology, Carleton University, Ottawa, ON, Canada K1S 5B6
| | - Cherie A Motti
- Australian Institute of Marine Science (AIMS), Cape Ferguson, Townsville Queensland 4810, Australia
| | - Min Zhao
- Genecology Research Centre, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia
| | - Scott F Cummins
- Genecology Research Centre, University of the Sunshine Coast, Maroochydore DC, Queensland 4558, Australia
| |
Collapse
|
14
|
Evaluation of Seven Different RNA-Seq Alignment Tools Based on Experimental Data from the Model Plant Arabidopsis thaliana. Int J Mol Sci 2020; 21:ijms21051720. [PMID: 32138290 PMCID: PMC7084517 DOI: 10.3390/ijms21051720] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 02/28/2020] [Accepted: 02/29/2020] [Indexed: 01/15/2023] Open
Abstract
Quantification of gene expression is crucial to connect genome sequences with phenotypic and physiological data. RNA-Sequencing (RNA-Seq) has taken a prominent role in the study of transcriptomic reactions of plants to various environmental and genetic perturbations. However, comparative tests of different tools for RNA-Seq read mapping and quantification have been mainly performed on data from animals or humans, which necessarily neglect, for example, the large genetic variability among natural accessions within plant species. Here, we compared seven computational tools for their ability to map and quantify Illumina single-end reads from the Arabidopsis thaliana accessions Columbia-0 (Col-0) and N14. Between 92.4% and 99.5% of all reads were mapped to the reference genome or transcriptome and the raw count distributions obtained from the different mappers were highly correlated. Using the software DESeq2 to determine differential gene expression (DGE) between plants exposed to 20 °C or 4 °C from these read counts showed a large pairwise overlap between the mappers. Interestingly, when the commercial CLC software was used with its own DGE module instead of DESeq2, strongly diverging results were obtained. All tested mappers provided highly similar results for mapping Illumina reads of two polymorphic Arabidopsis accessions to the reference genome or transcriptome and for the determination of DGE when the same software was used for processing.
Collapse
|
15
|
Transcriptome Landscape Variation in the Genus Thymus. Genes (Basel) 2019; 10:genes10080620. [PMID: 31426352 PMCID: PMC6723042 DOI: 10.3390/genes10080620] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/31/2019] [Accepted: 08/12/2019] [Indexed: 12/13/2022] Open
Abstract
Among the Lamiaceae family, the genus Thymus is an economically important genera due to its medicinal and aromatic properties. Most Thymus molecular research has focused on the determining the phylogenetic relationships between different species, but no published work has focused on the evolution of the transcriptome across the genus to elucidate genes involved in terpenoid biosynthesis. Hence, in this study, the transcriptomes of five different Thymus species were generated and analyzed to mine putative genes involved in thymol and carvacrol biosynthesis. High-throughput sequencing produced ~43 million high-quality reads per sample, which were assembled de novo using several tools, then further subjected to a quality evaluation. The best assembly for each species was used as queries to search within the UniProt, KEGG (Kyoto Encyclopedia of Genes and Genomes), COG (Clusters of Orthologous Groups) and TF (Transcription Factors) databases. Mining the transcriptomes resulted in the identification of 592 single-copy orthogroups used for phylogenetic analysis. The data showed strongly support a close genetic relationship between Thymus vulgaris and Thymus daenensis. Additionally, this study dates the speciation events between 1.5–2.1 and 9–10.2 MYA according to different methodologies. Our study provides a global overview of genes related to the terpenoid pathway in Thymus, and can help establish an understanding of the relationship that exists among Thymus species.
Collapse
|
16
|
Nuamtanong S, Reamtong O, Phuphisut O, Chotsiri P, Malaithong P, Dekumyoy P, Adisakwattana P. Transcriptome and excretory-secretory proteome of infective-stage larvae of the nematode Gnathostoma spinigerum reveal potential immunodiagnostic targets for development. ACTA ACUST UNITED AC 2019; 26:34. [PMID: 31166909 PMCID: PMC6550564 DOI: 10.1051/parasite/2019033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 05/16/2019] [Indexed: 01/02/2023]
Abstract
Background: Gnathostoma spinigerum is a harmful parasitic nematode that causes severe morbidity and mortality in humans and animals. Effective drugs and vaccines and reliable diagnostic methods are needed to prevent and control the associated diseases; however, the lack of genome, transcriptome, and proteome databases remains a major limitation. In this study, transcriptomic and secretomic analyses of advanced third-stage larvae of G. spinigerum (aL3Gs) were performed using next-generation sequencing, bioinformatics, and proteomics. Results: An analysis that incorporated transcriptome and bioinformatics data to predict excretory–secretory proteins (ESPs) classified 171 and 292 proteins into classical and non-classical secretory groups, respectively. Proteins with proteolytic (metalloprotease), cell signaling regulatory (i.e., kinases and phosphatase), and metabolic regulatory function (i.e., glucose and lipid metabolism) were significantly upregulated in the transcriptome and secretome. A two-dimensional (2D) immunomic analysis of aL3Gs-ESPs with G. spinigerum-infected human sera and related helminthiases suggested that the serine protease inhibitor (serpin) was a promising antigenic target for the further development of gnathostomiasis immunodiagnostic methods. Conclusions: The transcriptome and excretory–secretory proteome of aL3Gs can facilitate an understanding of the basic molecular biology of the parasite and identifying multiple associated factors, possibly promoting the discovery of novel drugs and vaccines. The 2D-immunomic analysis identified serpin, a protein secreted from aL3Gs, as an interesting candidate for immunodiagnosis that warrants immediate evaluation and validation.
Collapse
Affiliation(s)
- Supaporn Nuamtanong
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Onrapak Reamtong
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Orawan Phuphisut
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Palang Chotsiri
- Mahidol-Oxford Tropical Medicine Research Unit, Mahidol University, Bangkok 10400, Thailand
| | - Preeyarat Malaithong
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Paron Dekumyoy
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| | - Poom Adisakwattana
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok 10400, Thailand
| |
Collapse
|
17
|
Hölzer M, Marz M. De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience 2019; 8:giz039. [PMID: 31077315 PMCID: PMC6511074 DOI: 10.1093/gigascience/giz039] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 12/21/2018] [Accepted: 03/09/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND In recent years, massively parallel complementary DNA sequencing (RNA sequencing [RNA-Seq]) has emerged as a fast, cost-effective, and robust technology to study entire transcriptomes in various manners. In particular, for non-model organisms and in the absence of an appropriate reference genome, RNA-Seq is used to reconstruct the transcriptome de novo. Although the de novo transcriptome assembly of non-model organisms has been on the rise recently and new tools are frequently developing, there is still a knowledge gap about which assembly software should be used to build a comprehensive de novo assembly. RESULTS Here, we present a large-scale comparative study in which 10 de novo assembly tools are applied to 9 RNA-Seq data sets spanning different kingdoms of life. Overall, we built >200 single assemblies and evaluated their performance on a combination of 20 biological-based and reference-free metrics. Our study is accompanied by a comprehensive and extensible Electronic Supplement that summarizes all data sets, assembly execution instructions, and evaluation results. Trinity, SPAdes, and Trans-ABySS, followed by Bridger and SOAPdenovo-Trans, generally outperformed the other tools compared. Moreover, we observed species-specific differences in the performance of each assembler. No tool delivered the best results for all data sets. CONCLUSIONS We recommend a careful choice and normalization of evaluation metrics to select the best assembling results as a critical step in the reconstruction of a comprehensive de novo transcriptome assembly.
Collapse
Affiliation(s)
- Martin Hölzer
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University, Leutragraben 1, 07743 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University, Leutragraben 1, 07743 Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University, Leutragraben 1, 07743 Jena, Germany
- FLI Leibniz Institute for Age Research, Beutenbergstraße 11, 07743 Jena, Germany
| |
Collapse
|
18
|
Seoane P, Espigares M, Carmona R, Polonio Á, Quintana J, Cretazzo E, Bota J, Pérez-García A, Dios Alché JD, Gómez L, Claros MG. TransFlow: a modular framework for assembling and assessing accurate de novo transcriptomes in non-model organisms. BMC Bioinformatics 2018; 19:416. [PMID: 30453874 PMCID: PMC6245506 DOI: 10.1186/s12859-018-2384-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND The advances in high-throughput sequencing technologies are allowing more and more de novo assembling of transcriptomes from many new organisms. Some degree of automation and evaluation is required to warrant reproducibility, repetitivity and the selection of the best possible transcriptome. Workflows and pipelines are becoming an absolute requirement for such a purpose, but the issue of assembling evaluation for de novo transcriptomes in organisms lacking a sequenced genome remains unsolved. An automated, reproducible and flexible framework called TransFlow to accomplish this task is described. RESULTS TransFlow with its five independent modules was designed to build different workflows depending on the nature of the original reads. This architecture enables different combinations of Illumina and Roche/454 sequencing data, and can be extended to other sequencing platforms. Its capabilities are illustrated with the selection of reliable plant reference transcriptomes and the assembling six transcriptomes (three case studies for grapevine leaves, olive tree pollen, and chestnut stem, and other three for haustorium, epiphytic structures and their combination for the phytopathogenic fungus Podosphaera xanthii). Arabidopsis and poplar transcriptomes revealed to be the best references. A common result regarding de novo assemblies is that Illumina paired-end reads of 100 nt in length assembled with OASES can provide reliable transcriptomes, while the contribution of longer reads is noticeable only when they complement a set of short, single-reads. CONCLUSIONS TransFlow can handle up to 181 different assembling strategies. Evaluation based on principal component analyses allows its self-adaptation to different sets of reads to provide a suitable transcriptome for each combination of reads and assemblers. As a result, each case study has its own behaviour, prioritises evaluation parameters, and gives an objective and automated way for detecting the best transcriptome within a pool of them. Sequencing data type and quantity (preferably several hundred millions of 2×100 nt or longer), assemblers (OASES for Illumina, MIRA4 and EULER-SR reconciled with CAP3 for Roche/454) and strategy (preferably scaffolding with OASES, and probably merging with Roche/454 when available) arise as the most impacting factors.
Collapse
Affiliation(s)
- Pedro Seoane
- Departmento de Biología Molecular y Bioquímica, Universidad de Málaga, Campus de Teatinos s/n, Malaga, 29071 Spain
| | - Marina Espigares
- Departmento de Biología Molecular y Bioquímica, Universidad de Málaga, Campus de Teatinos s/n, Malaga, 29071 Spain
| | - Rosario Carmona
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants. Estación Experimental del Zaidín. CSIC, Prof. Albareda, 1, Granada, 18160 Spain
| | - Álvaro Polonio
- Departamento de Microbiología, and Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus de Teatinos s/n, Malaga, 29071 Spain
| | - Julia Quintana
- Department of Chemistry and Biochemistry, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA, 01609-2280 USA
| | - Enrico Cretazzo
- Instituto Andaluz de Investigación y Formación Agraria (IFAPA), Centro de Churriana, Cortijo de la Cruz s/n, Churriana, 29140 Spain
| | - Josefina Bota
- Grup de Recerca en Biologia de les Plantes en Condicions Mediterrànies, Departament de Biologia, Universitat de les Illes Balears, Carretera de Valldemossa, km 7.5, Palma de Mallorca, 07122 Spain
| | - Alejandro Pérez-García
- Departamento de Microbiología, and Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga, Consejo Superior de Investigaciones Científicas (IHSM-UMA-CSIC), Campus de Teatinos s/n, Malaga, 29071 Spain
| | - Juan de Dios Alché
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants. Estación Experimental del Zaidín. CSIC, Prof. Albareda, 1, Granada, 18160 Spain
| | - Luis Gómez
- Departamento de Sistemas y Recursos Naturales, ETSI Forestal, de Montes y del Medio Natural, Universidad Politécnica de Madrid, Ciudad Universitaria, Madrid, 28040 Spain
- CBGP, INIA-Universidad Politécnica de Madrid, Campus de Montegancedo, Pozuelo de Alarcón, 28223 Spain
| | - M. Gonzalo Claros
- Departmento de Biología Molecular y Bioquímica, Universidad de Málaga, Campus de Teatinos s/n, Malaga, 29071 Spain
| |
Collapse
|
19
|
Phuphisut O, Ajawatanawong P, Limpanont Y, Reamtong O, Nuamtanong S, Ampawong S, Chaimon S, Dekumyoy P, Watthanakulpanich D, Swierczewski BE, Adisakwattana P. Transcriptomic analysis of male and female Schistosoma mekongi adult worms. Parasit Vectors 2018; 11:504. [PMID: 30201055 PMCID: PMC6131826 DOI: 10.1186/s13071-018-3086-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 08/29/2018] [Indexed: 12/23/2022] Open
Abstract
Background Schistosoma mekongi is one of five major causative agents of human schistosomiasis and is endemic to communities along the Mekong River in southern Lao People’s Democratic Republic (Laos) and northern Cambodia. Sporadic cases of schistosomiasis have been reported in travelers and immigrants who have visited endemic areas. Schistosoma mekongi biology and molecular biology is poorly understood, and few S. mekongi gene and transcript sequences are available in public databases. Results Transcriptome sequencing (RNA-Seq) of male and female S. mekongi adult worms (a total of three biological replicates for each sex) were analyzed and the results demonstrated that approximately 304.9 and 363.3 million high-quality clean reads with quality Q30 (> 90%) were obtained from male and female adult worms, respectively. A total of 119,604 contigs were assembled with an average length of 1273 nt and an N50 of 2017 nt. From the contigs, 20,798 annotated protein sequences and 48,256 annotated transcript sequences were obtained using BLASTP and BLASTX searches against the UniProt Trematoda database. A total of 4658 and 3509 transcripts were predominantly expressed in male and female worms, respectively. Male-biased transcripts were mostly involved in structural organization while female-biased transcripts were typically involved in cell differentiation and egg production. Interestingly, pathway enrichment analysis suggested that genes involved in the phosphatidylinositol signaling pathway may play important roles in the cellular processes and reproductive systems of S. mekongi worms. Conclusions We present comparative transcriptomic analyses of male and female S. mekongi adult worms, which provide a global view of the S. mekongi transcriptome as well as insights into differentially-expressed genes associated with each sex. This work provides valuable information and sequence resources for future studies of gene function and for ongoing whole genome sequencing efforts in S. mekongi. Electronic supplementary material The online version of this article (10.1186/s13071-018-3086-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Orawan Phuphisut
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Pravech Ajawatanawong
- Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Yanin Limpanont
- Department of Social and Environmental Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Onrapak Reamtong
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Supaporn Nuamtanong
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Sumate Ampawong
- Department of Tropical Pathology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Salisa Chaimon
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Paron Dekumyoy
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Dorn Watthanakulpanich
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Brett E Swierczewski
- Department of Enteric Diseases, Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Poom Adisakwattana
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.
| |
Collapse
|
20
|
Ward MJ, Rokyta DR. Venom-gland transcriptomics and venom proteomics of the giant Florida blue centipede, Scolopendra viridis. Toxicon 2018; 152:121-136. [PMID: 30086358 DOI: 10.1016/j.toxicon.2018.07.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 07/25/2018] [Accepted: 07/31/2018] [Indexed: 12/19/2022]
Abstract
The limited number of centipede venom characterizations have revealed a rich diversity of toxins, and recent work has suggested centipede toxins may be more rapidly diversifying than previously considered. Additionally, many identified challenges in venomics research, including assembly and annotation methods, toxin quantification, and the ability to provide biological or technical replicates, have yet to be addressed in centipede venom characterizations. We performed high-throughput, quantifiable transcriptomic and proteomic methods on two individual Scolopendra viridis centipedes from North Florida. We identified 39 toxins that were proteomically confirmed, and 481 nontoxins that were expressed in the venom gland of S. viridis. The most abundant toxins expressed in the venom of S. viridis belonged to calcium and potassium ion-channel toxins, venom allergens, metalloproteases, and β-pore forming toxins. We compared our results to the previously characterized S. viridis from Morelos, Mexico, and found only five proteomically confirmed toxins in common to both localities, suggesting either extreme toxin divergence within S. viridis, or that these populations may represent entirely different species. By using multiple assembly and annotation methods, we generated a comprehensive and quantitative reference transcriptome and proteome of a Scolopendromorpha centipede species, while overcoming some of the challenges present in venomics research.
Collapse
Affiliation(s)
- Micaiah J Ward
- Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA
| | - Darin R Rokyta
- Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA.
| |
Collapse
|
21
|
von Reumont BM. Studying Smaller and Neglected Organisms in Modern Evolutionary Venomics Implementing RNASeq (Transcriptomics)-A Critical Guide. Toxins (Basel) 2018; 10:toxins10070292. [PMID: 30012955 PMCID: PMC6070909 DOI: 10.3390/toxins10070292] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 07/06/2018] [Accepted: 07/13/2018] [Indexed: 12/20/2022] Open
Abstract
Venoms are evolutionary key adaptations that species employ for defense, predation or competition. However, the processes and forces that drive the evolution of venoms and their toxin components remain in many aspects understudied. In particular, the venoms of many smaller, neglected (mostly invertebrate) organisms are not characterized in detail, especially with modern methods. For the majority of these taxa, even their biology is only vaguely known. Modern evolutionary venomics addresses the question of how venoms evolve by applying a plethora of -omics methods. These recently became so sensitive and enhanced that smaller, neglected organisms are now more easily accessible to comparatively study their venoms. More knowledge about these taxa is essential to better understand venom evolution in general. The methodological core pillars of integrative evolutionary venomics are genomics, transcriptomics and proteomics, which are complemented by functional morphology and the field of protein synthesis and activity tests. This manuscript focuses on transcriptomics (or RNASeq) as one toolbox to describe venom evolution in smaller, neglected taxa. It provides a hands-on guide that discusses a generalized RNASeq workflow, which can be adapted, accordingly, to respective projects. For neglected and small taxa, generalized recommendations are difficult to give and conclusions need to be made individually from case to case. In the context of evolutionary venomics, this overview highlights critical points, but also promises of RNASeq analyses. Methodologically, these concern the impact of read processing, possible improvements by perfoming multiple and merged assemblies, and adequate quantification of expressed transcripts. Readers are guided to reappraise their hypotheses on venom evolution in smaller organisms and how robustly these are testable with the current transcriptomics toolbox. The complementary approach that combines particular proteomics but also genomics with transcriptomics is discussed as well. As recently shown, comparative proteomics is, for example, most important in preventing false positive identifications of possible toxin transcripts. Finally, future directions in transcriptomics, such as applying 3rd generation sequencing strategies to overcome difficulties by short read assemblies, are briefly addressed.
Collapse
Affiliation(s)
- Björn Marcus von Reumont
- Justus Liebig University of Giessen, Institute for Insect Biotechnology, Heinrich Buff Ring 58, 35392 Giessen, Germany.
- Natural History Museum, Department of Life Sciences, Cromwell Rd, London SW75BD, UK.
| |
Collapse
|
22
|
Evaluating the Performance of De Novo Assembly Methods for Venom-Gland Transcriptomics. Toxins (Basel) 2018; 10:toxins10060249. [PMID: 29921759 PMCID: PMC6024825 DOI: 10.3390/toxins10060249] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 06/14/2018] [Accepted: 06/15/2018] [Indexed: 11/17/2022] Open
Abstract
Venom-gland transcriptomics is a key tool in the study of the evolution, ecology, function, and pharmacology of animal venoms. In particular, gene-expression variation and coding sequences gained through transcriptomics provide key information for explaining functional venom variation over both ecological and evolutionary timescales. The accuracy and usefulness of inferences made through transcriptomics, however, is limited by the accuracy of the transcriptome assembly, which is a bioinformatic problem with several possible solutions. Several methods have been employed to assemble venom-gland transcriptomes, with the Trinity assembler being the most commonly applied among them. Although previous evidence of variation in performance among assembly software exists, particularly regarding recovery of difficult-to-assemble multigene families such as snake venom metalloproteinases, much work to date still employs a single assembly method. We evaluated the performance of several commonly used de novo assembly methods for the recovery of both nontoxin transcripts and complete, high-quality venom-gene transcripts across eleven snake and four scorpion transcriptomes. We varied k-mer sizes used by some assemblers to evaluate the impact of k-mer length on transcript recovery. We showed that the recovery of nontoxin transcripts and toxin transcripts is best accomplished through different assembly software, with SDT at smaller k-mer lengths and Trinity being best for nontoxin recovery and a combination of SeqMan NGen and a seed-and-extend approach implemented in Extender as the best means of recovering a complete set of toxin transcripts. In particular, Extender was the only means tested capable of assembling multiple isoforms of the diverse snake venom metalloproteinase family, while traditional approaches such as Trinity recovered at most one metalloproteinase transcript. Our work demonstrated that traditional metrics of assembly performance are not predictive of performance in the recovery of complete and high quality toxin genes. Instead, effective venom-gland transcriptomic studies should combine and quality-filter the results of several assemblers with varying algorithmic strategies.
Collapse
|
23
|
Hoang NV, Furtado A, Thirugnanasambandam PP, Botha FC, Henry RJ. De novo assembly and characterizing of the culm-derived meta-transcriptome from the polyploid sugarcane genome based on coding transcripts. Heliyon 2018; 4:e00583. [PMID: 29862346 PMCID: PMC5968133 DOI: 10.1016/j.heliyon.2018.e00583] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 03/02/2018] [Accepted: 03/16/2018] [Indexed: 12/31/2022] Open
Abstract
Sugarcane biomass has been used for sugar, bioenergy and biomaterial production. The majority of the sugarcane biomass comes from the culm, which makes it important to understand the genetic control of biomass production in this part of the plant. A meta-transcriptome of the culm was obtained in an earlier study by using about one billion paired-end (150 bp) reads of deep RNA sequencing of samples from 20 diverse sugarcane genotypes and combining de novo assemblies from different assemblers and different settings. Although many genes could be recovered, this resulted in a large combined assembly which created the need for clustering to reduce transcript redundancy while maintaining gene content. Here, we present a comprehensive analysis of the effect of different assembly settings and clustering methods on de novo assembly, annotation and transcript profiling focusing especially on the coding transcripts from the highly polyploid sugarcane genome. The new coding sequence-based transcript clustering resulted in a better representation of transcripts compared to the earlier approach, having 121,987 contigs, which included 78,052 main and 43,935 alternative transcripts. About 73%, 67%, 61% and 10% of the transcriptome was annotated against the NCBI NR protein database, GO terms, orthologous groups and KEGG orthologies, respectively. Using this set for a differential gene expression analysis between the young and mature sugarcane culm tissues, a total of 822 transcripts were found to be differentially expressed, including key transcripts involved in sugar/fiber accumulation in sugarcane. In the context of the lack of a whole genome sequence for sugarcane, the availability of a well annotated culm-derived meta-transcriptome through deep sequencing provides useful information on coding genes specific to the sugarcane culm and will certainly contribute to understanding the process of carbon partitioning, and biomass accumulation in the sugarcane culm.
Collapse
Affiliation(s)
- Nam V. Hoang
- College of Agriculture and Forestry, Hue University, Hue, Vietnam
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Queensland, 4072, Australia
| | - Prathima P. Thirugnanasambandam
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Queensland, 4072, Australia
- ICAR - Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
| | - Frederik C. Botha
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Queensland, 4072, Australia
- Sugar Research Australia, Indooroopilly, Queensland, Australia
| | - Robert J. Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Queensland, 4072, Australia
| |
Collapse
|
24
|
Dhaygude K, Trontti K, Paviala J, Morandin C, Wheat C, Sundström L, Helanterä H. Transcriptome sequencing reveals high isoform diversity in the ant Formica exsecta. PeerJ 2017; 5:e3998. [PMID: 29177112 PMCID: PMC5701548 DOI: 10.7717/peerj.3998] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 10/17/2017] [Indexed: 12/21/2022] Open
Abstract
Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera). The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans), in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon.
Collapse
Affiliation(s)
- Kishor Dhaygude
- Centre of Excellence in Biological Interactions, Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Kalevi Trontti
- Department of Biosciences, Neurogenomics Laboratory, University of Helsinki, Helsinki, Finland
| | - Jenni Paviala
- Centre of Excellence in Biological Interactions, Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Claire Morandin
- Centre of Excellence in Biological Interactions, Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Christopher Wheat
- Department of Zoology Ecology, Stockholm University, Stockholm, Sweden
| | - Liselotte Sundström
- Centre of Excellence in Biological Interactions, Department of Biosciences, University of Helsinki, Helsinki, Finland
- Tvärminne Zoological Station, University of Helsinki, Hanko, Finland
| | - Heikki Helanterä
- Centre of Excellence in Biological Interactions, Department of Biosciences, University of Helsinki, Helsinki, Finland
- Tvärminne Zoological Station, University of Helsinki, Hanko, Finland
| |
Collapse
|
25
|
Gates K, Sandoval-Castillo J, Bernatchez L, Beheregaray LB. De novo transcriptome assembly and annotation for the desert rainbowfish ( Melanotaenia splendida tatei ) with comparison with candidate genes for future climates. Mar Genomics 2017; 35:63-68. [DOI: 10.1016/j.margen.2017.05.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 05/15/2017] [Indexed: 01/25/2023]
|
26
|
Challenges and advances for transcriptome assembly in non-model species. PLoS One 2017; 12:e0185020. [PMID: 28931057 PMCID: PMC5607178 DOI: 10.1371/journal.pone.0185020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 09/04/2017] [Indexed: 12/28/2022] Open
Abstract
Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.
Collapse
|
27
|
Lopez L, Wolf EM, Pires JC, Edger PP, Koch MA. Molecular Resources from Transcriptomes in the Brassicaceae Family. FRONTIERS IN PLANT SCIENCE 2017; 8:1488. [PMID: 28900436 PMCID: PMC5581910 DOI: 10.3389/fpls.2017.01488] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 08/11/2017] [Indexed: 06/07/2023]
Abstract
The rapidly falling costs and the increasing availability of large DNA sequence data sets facilitate the fast and affordable mining of large molecular markers data sets for comprehensive evolutionary studies. The Brassicaceae (mustards) are an important species-rich family in the plant kingdom with taxa distributed worldwide and a complex evolutionary history. We performed Simple Sequence Repeats (SSRs) mining using de novo assembled transcriptomes from 19 species across the Brassicaceae in order to study SSR evolution and provide comprehensive sets of molecular markers for genetic studies within the family. Moreover, we selected the genus Cochlearia to test the transferability and polymorphism of these markers among species. Additionally, we annotated Cochlearia pyrenaica transcriptome in order to identify the position of each of the mined SSRs. While we introduce a new set of tools that will further enable evolutionary studies across the Brassicaceae, we also discuss some broader aspects of SSR evolution. Overall, we developed 2012 ready-to-use SSR markers with their respective primers in 19 Brassicaceae species and a high quality annotated transcriptome for C. pyrenaica. As indicated by our transferability test with the genus Cochlearia these SSRs are transferable to species within the genus increasing exponentially the number of targeted species. Also, our polymorphism results showed substantial levels of variability for these markers. Finally, despite its complex evolutionary history, SSR evolution across the Brassicaceae family is highly conserved and we found no deviation from patterns reported in other Angiosperms.
Collapse
Affiliation(s)
- Lua Lopez
- Biodiversity and Plant Systematics, Centre of Organismal Studies, University of HeidelbergHeidelberg, Germany
| | - Eva M. Wolf
- Biodiversity and Plant Systematics, Centre of Organismal Studies, University of HeidelbergHeidelberg, Germany
| | - J. Chris Pires
- Division of Biological Sciences, University of MissouriColumbia, MO, United States
| | - Patrick P. Edger
- Department of Horticulture, Michigan State UniversityEast Lansing, MI, United States
- Ecology, Evolutionary Biology and Behavior, Michigan State UniversityEast Lansing, MI, United States
| | - Marcus A. Koch
- Biodiversity and Plant Systematics, Centre of Organismal Studies, University of HeidelbergHeidelberg, Germany
| |
Collapse
|
28
|
Johnson KL, Cassin AM, Lonsdale A, Bacic A, Doblin MS, Schultz CJ. Pipeline to Identify Hydroxyproline-Rich Glycoproteins. PLANT PHYSIOLOGY 2017; 174:886-903. [PMID: 28446635 PMCID: PMC5462032 DOI: 10.1104/pp.17.00294] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 04/21/2017] [Indexed: 05/14/2023]
Abstract
Intrinsically disordered proteins (IDPs) are functional proteins that lack a well-defined three-dimensional structure. The study of IDPs is a rapidly growing area as the crucial biological functions of more of these proteins are uncovered. In plants, IDPs are implicated in plant stress responses, signaling, and regulatory processes. A superfamily of cell wall proteins, the hydroxyproline-rich glycoproteins (HRGPs), have characteristic features of IDPs. Their protein backbones are rich in the disordering amino acid proline, they contain repeated sequence motifs and extensive posttranslational modifications (glycosylation), and they have been implicated in many biological functions. HRGPs are evolutionarily ancient, having been isolated from the protein-rich walls of chlorophyte algae to the cellulose-rich walls of embryophytes. Examination of HRGPs in a range of plant species should provide valuable insights into how they have evolved. Commonly divided into the arabinogalactan proteins, extensins, and proline-rich proteins, in reality, a continuum of structures exists within this diverse and heterogenous superfamily. An inability to accurately classify HRGPs leads to inconsistent gene ontologies limiting the identification of HRGP classes in existing and emerging omics data sets. We present a novel and robust motif and amino acid bias (MAAB) bioinformatics pipeline to classify HRGPs into 23 descriptive subclasses. Validation of MAAB was achieved using available genomic resources and then applied to the 1000 Plants transcriptome project (www.onekp.com) data set. Significant improvement in the detection of HRGPs using multiple-k-mer transcriptome assembly methodology was observed. The MAAB pipeline is readily adaptable and can be modified to optimize the recovery of IDPs from other organisms.
Collapse
Affiliation(s)
- Kim L Johnson
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Andrew M Cassin
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Andrew Lonsdale
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Antony Bacic
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Monika S Doblin
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| | - Carolyn J Schultz
- Australian Research Council Centre of Excellence in Plant Cell Walls, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia (K.L.J., A.M.C., A.L., A.B., M.S.D.); and
- School of Agriculture, Food, and Wine, University of Adelaide, Waite Research Institute, Glen Osmond, South Australia 5064, Australia (C.J.S.)
| |
Collapse
|
29
|
Hoang NV, Furtado A, Mason PJ, Marquardt A, Kasirajan L, Thirugnanasambandam PP, Botha FC, Henry RJ. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing. BMC Genomics 2017; 18:395. [PMID: 28532419 PMCID: PMC5440902 DOI: 10.1186/s12864-017-3757-8] [Citation(s) in RCA: 109] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 05/03/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms. RESULTS The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes. CONCLUSIONS The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.
Collapse
Affiliation(s)
- Nam V Hoang
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,College of Agriculture and Forestry, Hue University, Hue, Vietnam
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia
| | - Patrick J Mason
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia
| | - Annelie Marquardt
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,Sugar Research Australia, Indooroopilly, QLD, 4068, Australia
| | - Lakshmi Kasirajan
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,ICAR - Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
| | - Prathima P Thirugnanasambandam
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,ICAR - Sugarcane Breeding Institute, Coimbatore, Tamil Nadu, India
| | - Frederik C Botha
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.,Sugar Research Australia, Indooroopilly, QLD, 4068, Australia
| | - Robert J Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Room 2.245, Level 2, The John Hay Building, Queensland Biosciences Precinct [#80], 306 Carmody Road, St. Lucia, QLD, 4072, Australia.
| |
Collapse
|