1
|
Miller JR, Adjeroh DA. Machine learning on alignment features for parent-of-origin classification of simulated hybrid RNA-seq. BMC Bioinformatics 2024; 25:109. [PMID: 38475727 DOI: 10.1186/s12859-024-05728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND Parent-of-origin allele-specific gene expression (ASE) can be detected in interspecies hybrids by virtue of RNA sequence variants between the parental haplotypes. ASE is detectable by differential expression analysis (DEA) applied to the counts of RNA-seq read pairs aligned to parental references, but aligners do not always choose the correct parental reference. RESULTS We used public data for species that are known to hybridize. We measured our ability to assign RNA-seq read pairs to their proper transcriptome or genome references. We tested software packages that assign each read pair to a reference position and found that they often favored the incorrect species reference. To address this problem, we introduce a post process that extracts alignment features and trains a random forest classifier to choose the better alignment. On each simulated hybrid dataset tested, our machine-learning post-processor achieved higher accuracy than the aligner by itself at choosing the correct parent-of-origin per RNA-seq read pair. CONCLUSIONS For the parent-of-origin classification of RNA-seq, machine learning can improve the accuracy of alignment-based methods. This approach could be useful for enhancing ASE detection in interspecies hybrids, though RNA-seq from real hybrids may present challenges not captured by our simulations. We believe this is the first application of machine learning to this problem domain.
Collapse
Affiliation(s)
- Jason R Miller
- Department of Computer Science, Mathematics, Engineering, Shepherd University, Shepherdstown, WV, USA.
- EVOGENE, Department of Biosciences, University of Oslo, Oslo, Norway.
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA.
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| |
Collapse
|
2
|
Burioli EAV, Hammel M, Vignal E, Vidal-Dupiol J, Mitta G, Thomas F, Bierne N, Destoumieux-Garzón D, Charrière GM. Transcriptomics of mussel transmissible cancer MtrBTN2 suggests accumulation of multiple cancer traits and oncogenic pathways shared among bilaterians. Open Biol 2023; 13:230259. [PMID: 37816387 PMCID: PMC10564563 DOI: 10.1098/rsob.230259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/12/2023] [Indexed: 10/12/2023] Open
Abstract
Transmissible cancer cell lines are rare biological entities giving rise to diseases at the crossroads of cancer and parasitic diseases. These malignant cells have acquired the amazing capacity to spread from host to host. They have been described only in dogs, Tasmanian devils and marine bivalves. The Mytilus trossulus bivalve transmissible neoplasia 2 (MtrBTN2) lineage has even acquired the capacity to spread inter-specifically between marine mussels of the Mytilus edulis complex worldwide. To identify the oncogenic processes underpinning the biology of these atypical cancers we performed transcriptomics of MtrBTN2 cells. Differential expression, enrichment, protein-protein interaction network, and targeted analyses were used. Overall, our results suggest the accumulation of multiple cancerous traits that may be linked to the long-term evolution of MtrBTN2. We also highlight that vertebrate and lophotrochozoan cancers could share a large panel of common drivers, which supports the hypothesis of an ancient origin of oncogenic processes in bilaterians.
Collapse
Affiliation(s)
- E A V Burioli
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Montpellier, France
| | - M Hammel
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Montpellier, France
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - E Vignal
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Montpellier, France
| | - J Vidal-Dupiol
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Montpellier, France
| | - G Mitta
- IFREMER, UMR 241 Écosystèmes Insulaires Océaniens, Labex Corail, Centre Ifremer du Pacifique, Tahiti, Polynésie française
| | - F Thomas
- CREEC/CANECEV (CREES), MIVEGEC, Unité Mixte de Recherches, IRD 224-CNRS 5290-Université de Montpellier, Montpellier, France
| | - N Bierne
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - D Destoumieux-Garzón
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Montpellier, France
| | - G M Charrière
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Montpellier, France
| |
Collapse
|
3
|
Theissinger K, Fernandes C, Formenti G, Bista I, Berg PR, Bleidorn C, Bombarely A, Crottini A, Gallo GR, Godoy JA, Jentoft S, Malukiewicz J, Mouton A, Oomen RA, Paez S, Palsbøll PJ, Pampoulie C, Ruiz-López MJ, Secomandi S, Svardal H, Theofanopoulou C, de Vries J, Waldvogel AM, Zhang G, Jarvis ED, Bálint M, Ciofi C, Waterhouse RM, Mazzoni CJ, Höglund J. How genomics can help biodiversity conservation. Trends Genet 2023:S0168-9525(23)00020-3. [PMID: 36801111 DOI: 10.1016/j.tig.2023.01.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/08/2022] [Accepted: 01/19/2023] [Indexed: 02/18/2023]
Abstract
The availability of public genomic resources can greatly assist biodiversity assessment, conservation, and restoration efforts by providing evidence for scientifically informed management decisions. Here we survey the main approaches and applications in biodiversity and conservation genomics, considering practical factors, such as cost, time, prerequisite skills, and current shortcomings of applications. Most approaches perform best in combination with reference genomes from the target species or closely related species. We review case studies to illustrate how reference genomes can facilitate biodiversity research and conservation across the tree of life. We conclude that the time is ripe to view reference genomes as fundamental resources and to integrate their use as a best practice in conservation genomics.
Collapse
Affiliation(s)
- Kathrin Theissinger
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity and Climate Research Centre, Georg-Voigt-Str. 14-16, 60325 Frankfurt/Main, Germany
| | - Carlos Fernandes
- CE3C - Centre for Ecology, Evolution and Environmental Changes & CHANGE - Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal; Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Giulio Formenti
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA
| | - Iliana Bista
- Naturalis Biodiversity Center, Darwinweg 2, 2333, CR, Leiden, The Netherlands; Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Paul R Berg
- NIVA - Norwegian Institute for Water Research, Økernveien, 94, 0579 Oslo, Norway; Centre for Coastal Research, University of Agder, Gimlemoen 25j, 4630 Kristiansand, Norway; Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO BOX 1066 Blinderm, 0316 Oslo, Norway
| | - Christoph Bleidorn
- University of Göttingen, Department of Animal Evolution and Biodiversity, Untere Karspüle, 2, 37073, Göttingen, Germany
| | | | - Angelica Crottini
- CIBIO/InBio, Centro de Investigação em Biodiversidade e Recursos Genéticos, Rua Padre Armando Quintas, 7, 4485-661, Portugal; Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, 4099-002 Porto, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661 Vairão, Portugal
| | - Guido R Gallo
- Department of Biosciences, University of Milan, Milan, Italy
| | - José A Godoy
- Estación Biológica de Doñana, CSIC, Calle Americo Vespucio 26, 41092, Sevillle, Spain
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO BOX 1066 Blinderm, 0316 Oslo, Norway
| | - Joanna Malukiewicz
- Primate Genetics Laborator, German Primate Center, Kellnerweg 4, 37077, Göttingen, Germany
| | - Alice Mouton
- InBios - Conservation Genetics Lab, University of Liege, Chemin de la Vallée 4, 4000, Liege, Belgium
| | - Rebekah A Oomen
- Centre for Coastal Research, University of Agder, Gimlemoen 25j, 4630 Kristiansand, Norway; Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO BOX 1066 Blinderm, 0316 Oslo, Norway
| | - Sadye Paez
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA
| | - Per J Palsbøll
- Groningen Institute of Evolutionary Life Sciences, University of Groningen, Nijenborgh, 9747, AG, Groningen, The Netherlands; Center for Coastal Studies, 5 Holway Avenue, Provincetown, MA 02657, USA
| | - Christophe Pampoulie
- Marine and Freshwater Research Institute, Fornubúðir, 5,220, Hanafjörður, Iceland
| | - María J Ruiz-López
- Estación Biológica de Doñana, CSIC, Calle Americo Vespucio 26, 41092, Sevillle, Spain; CIBER de Epidemiología y Salud Pública (CIBERESP), Spain
| | | | - Hannes Svardal
- Department of Biology, University of Antwerp, Universiteitsplein 1, 2610 Wilrijk, Antwerp, Belgium
| | - Constantina Theofanopoulou
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA; Hunter College, City University of New York, NY, USA
| | - Jan de Vries
- University of Goettingen, Institute for Microbiology and Genetics, Department of Applied Bioinformatics, Goettingen Center for Molecular Biosciences (GZMB), Campus Institute Data Science (CIDAS), Goldschmidtstr. 1, 37077, Goettingen, Germany
| | - Ann-Marie Waldvogel
- Institute of Zoology, University of Cologne, Zülpicherstrasse 47b, D-50674, Cologne, Germany
| | - Guojie Zhang
- Evolutionary & Organismal Biology Research Center, Zhejiang University School of Medicine, Hangzhou, 310058, China; Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Denmark; State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Erich D Jarvis
- The Rockefeller University, 1230 York Ave, New York, NY 10065, USA
| | - Miklós Bálint
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Biodiversity and Climate Research Centre, Georg-Voigt-Str. 14-16, 60325 Frankfurt/Main, Germany
| | - Claudio Ciofi
- University of Florence, Department of Biology, Via Madonna del Piano 6, Sesto Fiorentino, (FI) 50019, Italy
| | - Robert M Waterhouse
- University of Lausanne, Department of Ecology and Evolution, Le Biophore, UNIL-Sorge, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Camila J Mazzoni
- Leibniz Institute for Zoo and Wildlife Research (IZW), Alfred-Kowalke-Str 17, 10315 Berlin, Germany; Berlin Center for Genomics in Biodiversity Research (BeGenDiv), Koenigin-Luise-Str 6-8, 14195 Berlin, Germany
| | - Jacob Höglund
- Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75246, Uppsala, Sweden.
| | | |
Collapse
|
4
|
Tao F, Fan C, Liu Y, Sivakumar S, Kowalski KP, Golenberg EM. Optimization and application of non-native Phragmites australis transcriptome assemblies. PLoS One 2023; 18:e0280354. [PMID: 36689482 PMCID: PMC9870158 DOI: 10.1371/journal.pone.0280354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/27/2022] [Indexed: 01/24/2023] Open
Abstract
Phragmites australis (common reed) has a cosmopolitan distribution and has been suggested as a model organism for the study of invasive plant species. In North America, the non-native subspecies (ssp. australis) is widely distributed across the contiguous 48 states in the United States and large parts of Canada. Even though millions of dollars are spent annually on Phragmites management, insufficient knowledge of P. australis impeded the efficiency of management. To solve this problem, transcriptomic information generated from multiple types of tissue could be a valuable resource for future studies. Here, we constructed forty-nine P. australis transcriptomes assemblies via different assembly tools and multiple parameter settings. The optimal transcriptome assembly for functional annotation and downstream analyses was selected among these transcriptome assemblies by comprehensive assessments. For a total of 422,589 transcripts assembled in this transcriptome assembly, 319,046 transcripts (75.5%) have at least one functional annotation. Within the transcriptome assembly, we further identified 1,495 transcripts showing tissue-specific expression pattern, 10,828 putative transcription factors, and 72,165 candidates for simple sequence repeats markers. The identification and analyses of predicted transcripts related to herbicide- and salinity-resistant genes were shown as two applications of the transcriptomic information to facilitate further research on P. australis. Transcriptome assembly and selection would be important for the transcriptome annotation. With this optimal transcriptome assembly and all relative information from downstream analyses, we have helped to establish foundations for future studies on the mechanisms underlying the invasiveness of non-native P. australis subspecies.
Collapse
Affiliation(s)
- Feng Tao
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Chuanzhu Fan
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Yimin Liu
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Subashini Sivakumar
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| | - Kurt P. Kowalski
- U.S. Geological Survey-Great Lakes Science Center, Ann Arbor, MI, United States of America
| | - Edward M. Golenberg
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States of America
| |
Collapse
|
5
|
Tu M, Zeng J, Zhang J, Fan G, Song G. Unleashing the power within short-read RNA-seq for plant research: Beyond differential expression analysis and toward regulomics. FRONTIERS IN PLANT SCIENCE 2022; 13:1038109. [PMID: 36570898 PMCID: PMC9773216 DOI: 10.3389/fpls.2022.1038109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 11/21/2022] [Indexed: 06/17/2023]
Abstract
RNA-seq has become a state-of-the-art technique for transcriptomic studies. Advances in both RNA-seq techniques and the corresponding analysis tools and pipelines have unprecedently shaped our understanding in almost every aspects of plant sciences. Notably, the integration of huge amount of RNA-seq with other omic data sets in the model plants and major crop species have facilitated plant regulomics, while the RNA-seq analysis has still been primarily used for differential expression analysis in many less-studied plant species. To unleash the analytical power of RNA-seq in plant species, especially less-studied species and biomass crops, we summarize recent achievements of RNA-seq analysis in the major plant species and representative tools in the four types of application: (1) transcriptome assembly, (2) construction of expression atlas, (3) network analysis, and (4) structural alteration. We emphasize the importance of expression atlas, coexpression networks and predictions of gene regulatory relationships in moving plant transcriptomes toward regulomics, an omic view of genome-wide transcription regulation. We highlight what can be achieved in plant research with RNA-seq by introducing a list of representative RNA-seq analysis tools and resources that are developed for certain minor species or suitable for the analysis without species limitation. In summary, we provide an updated digest on RNA-seq tools, resources and the diverse applications for plant research, and our perspective on the power and challenges of short-read RNA-seq analysis from a regulomic point view. A full utilization of these fruitful RNA-seq resources will promote plant omic research to a higher level, especially in those less studied species.
Collapse
Affiliation(s)
- Min Tu
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Jian Zeng
- Guangdong Provincial Key Laboratory of Utilization and Conservation of Food and Medicinal Resources in Northern Region, Shaoguan University, Shaoguan, Guangdong, China
| | - Juntao Zhang
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Guozhi Fan
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| | - Guangsen Song
- School of Chemical and Environmental Engineering, Wuhan Polytechnic University, Wuhan, China
| |
Collapse
|
6
|
Genome-Wide Identification and Characterization of RNA/DNA Differences Associated with Fusarium graminearum Infection in Wheat. Int J Mol Sci 2022; 23:ijms23147982. [PMID: 35887327 PMCID: PMC9316857 DOI: 10.3390/ijms23147982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 06/29/2022] [Accepted: 07/14/2022] [Indexed: 12/03/2022] Open
Abstract
RNA/DNA difference (RDD) is a post-transcriptional modification playing a crucial role in regulating diverse biological processes in eukaryotes. Although it has been extensively studied in plant chloroplast and mitochondria genomes, RDDs in plant nuclear genomes are not well studied at present. Here, we investigated the RDDs associated with fusarium head blight (FHB) through a novel method by comparing the RNA-seq data between Fusarium-infected and control samples of four wheat genotypes. A total of 187 high-confidence unique RDDs in 36 genes were identified, representing the first landscape of the FHB-responsive RDD in wheat. The majority (26) of these 36 RDD genes were correlated either positively or negatively with FHB levels. Effects of these RDDs on RNA and protein sequences have been identified, their editing frequency and the expression level of the corresponding genes provided, and the prediction of the effect on the minimum folding free energy of mRNA, miRNA binding, and colocation of RDDs with conserved domains presented. RDDs were predicted to induce modifications in the mRNA and protein structures of the corresponding genes. In two genes, TraesCS1B02G294300 and TraesCS3A02G263900, editing was predicted to enhance their affinity with tae-miR9661-5p and tae-miR9664-3p, respectively. To our knowledge, this study is the first report of the association between RDD and FHB in wheat; this will contribute to a better understanding of the molecular basis underlying FHB resistance, and potentially lead to novel strategies to improve wheat FHB resistance through epigenetic methods.
Collapse
|
7
|
Potemkin N, Cawood SMF, Treece J, Guévremont D, Rand CJ, McLean C, Stanton JAL, Williams JM. A method for simultaneous detection of small and long RNA biotypes by ribodepleted RNA-Seq. Sci Rep 2022; 12:621. [PMID: 35022475 PMCID: PMC8755727 DOI: 10.1038/s41598-021-04209-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/24/2021] [Indexed: 11/09/2022] Open
Abstract
RNA sequencing offers unprecedented access to the transcriptome. Key to this is the identification and quantification of many different species of RNA from the same sample at the same time. In this study we describe a novel protocol for simultaneous detection of coding and non-coding transcripts using modifications to the Ion Total RNA-Seq kit v2 protocol, with integration of QIASeq FastSelect rRNA removal kit. We report highly consistent sequencing libraries can be produced from both frozen high integrity mouse hippocampal tissue and the more challenging post-mortem human tissue. Removal of rRNA using FastSelect was extremely efficient, resulting in less than 1.5% rRNA content in the final library. We identified > 30,000 unique transcripts from all samples, including protein-coding genes and many species of non-coding RNA, in biologically-relevant proportions. Furthermore, the normalized sequencing read count for select genes significantly negatively correlated with Ct values from qRT-PCR analysis from the same samples. These results indicate that this protocol accurately and consistently identifies and quantifies a wide variety of transcripts simultaneously. The highly efficient rRNA depletion, coupled with minimized sample handling and without complicated and high-loss size selection protocols, makes this protocol useful to researchers wishing to investigate whole transcriptomes.
Collapse
Affiliation(s)
- Nikita Potemkin
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand
| | - Sophie M F Cawood
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand
| | - Jackson Treece
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Diane Guévremont
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand
| | - Christy J Rand
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Catriona McLean
- Victorian Brain Bank, The Florey Institute of Neuroscience and Mental Health, Melbourne, VIC, Australia
- Anatomical Pathology, The Alfred Hospital, Melbourne, VIC, Australia
| | - Jo-Ann L Stanton
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Joanna M Williams
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand.
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand.
| |
Collapse
|
8
|
Sewe SO, Silva G, Sicat P, Seal SE, Visendi P. Trimming and Validation of Illumina Short Reads Using Trimmomatic, Trinity Assembly, and Assessment of RNA-Seq Data. Methods Mol Biol 2022; 2443:211-232. [PMID: 35037208 DOI: 10.1007/978-1-0716-2067-0_11] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Next-generation sequencing (NGS) technologies can generate billions of reads in a single sequencing run. However, with such high-throughput comes quality issues which have to be addressed before undertaking downstream analysis. Quality control on short reads is usually performed at default settings due to a lack of in-depth understanding of a particular software's parameters and their effect if changed on the output. Here we demonstrate how to optimize read trimming using Trimmomatic. We highlight the benefits of trimming by comparing the quality of transcripts assembled using trimmed and untrimmed reads.
Collapse
Affiliation(s)
- Steven O Sewe
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Gonçalo Silva
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Paulo Sicat
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Susan E Seal
- Natural Resources Institute, University of Greenwich, Kent, UK
| | - Paul Visendi
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
9
|
Shmakov NА. Improving the quality of barley transcriptome de novo assembling by using a hybrid approach for lines with varying spike and stem coloration. Vavilovskii Zhurnal Genet Selektsii 2021; 25:30-38. [PMID: 34901701 PMCID: PMC8627909 DOI: 10.18699/vj21.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/15/2021] [Accepted: 01/15/2021] [Indexed: 11/19/2022] Open
Abstract
De novo transcriptome assembly is an important stage of RNA-seq data computational analysis. It allows the researchers to obtain the sequences of transcripts presented in the biological sample of interest. The availability of accurate and complete transcriptome sequence of the organism of interest is, in turn, an indispensable condition for further analysis of RNA-seq data. Through years of transcriptomic research, the bioinformatics community has developed a number of assembler programs for transcriptome reconstruction from short reads of RNA-seq libraries. Different assemblers makes it possible to conduct a de novo transcriptome reconstruction and a genome-guided reconstruction. The majority of the assemblers working with RNA-seq data are based on the De Bruijn graph method of sequence reconstruction. However, specif ics of their procedures can vary drastically, as do their results. A number of authors recommend a hybrid approach to transcriptome reconstruction based on combining the results of several assemblers in order to achieve a better transcriptome assembly. The advantage of this approach has been demonstrated in a number of studies, with RNA-seq experiments conducted on the Illumina platform. In this paper, we propose a hybrid approach for creating a transcriptome assembly of the barley Hordeum vulgare isogenic line Bowman and two nearly isogenic lines contrasting in spike pigmentation, based on the results of sequencing on the IonTorrent platform. This approach implements several de novo assemblers: Trinity, Trans-ABySS and rnaSPAdes. Several assembly metrics were examined: the percentage of reference transcripts observed in the assemblies, the percentage of RNA-seq reads involved, and BUSCO scores. It was shown that, based on the summation of these metrics, transcriptome meta-assembly surpasses individual transcriptome assemblies it consists of.
Collapse
Affiliation(s)
- N А Shmakov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomics Center, Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
10
|
Madritsch S, Burg A, Sehr EM. Comparing de novo transcriptome assembly tools in di- and autotetraploid non-model plant species. BMC Bioinformatics 2021; 22:146. [PMID: 33752598 PMCID: PMC7986043 DOI: 10.1186/s12859-021-04078-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 03/15/2021] [Indexed: 01/15/2023] Open
Abstract
Background Polyploidy is very common in plants and can be seen as one of the key drivers in the domestication of crops and the establishment of important agronomic traits. It can be the main source of genomic repatterning and introduces gene duplications, affecting gene expression and alternative splicing. Since fully sequenced genomes are not yet available for many plant species including crops, de novo transcriptome assembly is the basis to understand molecular and functional mechanisms. However, in complex polyploid plants, de novo transcriptome assembly is challenging, leading to increased rates of fused or redundant transcripts. Since assemblers were developed mainly for diploid organisms, they may not well suited for polyploids. Also, comparative evaluations of these tools on higher polyploid plants are extremely rare. Thus, our aim was to fill this gap and to provide a basic guideline for choosing the optimal de novo assembly strategy focusing on autotetraploids, as the scientific interest in this type of polyploidy is steadily increasing. Results We present a comparison of two common (SOAPdenovo-Trans, Trinity) and one recently published transcriptome assembler (TransLiG) on diploid and autotetraploid species of the genera Acer and Vaccinium using Arabidopsis thaliana as a reference. The number of assembled transcripts was up to 11 and 14 times higher with an increased number of short transcripts for Acer and Vaccinium, respectively, compared to A. thaliana. In diploid samples, Trinity and TransLiG performed similarly good while in autotetraploids, TransLiG assembled most complete transcriptomes with an average of 1916 assembled BUSCOs vs. 1705 BUSCOs for Trinity. Of all three assemblers, SOAPdenovo-Trans performed worst (1133 complete BUSCOs). Conclusion All three assembly tools produced complete assemblies when dealing with the model organism A. thaliana, independently of its ploidy level, but their performances differed extremely when it comes to non-model autotetraploids, where specifically TransLiG and Trinity produced a high number of redundant transcripts. The recently published assembler TransLiG has not been tested yet on any plant organism but showed highest completeness and full-length transcriptomes, especially in autotetraploids. Including such species during the development and testing of new assembly tools is highly appreciated and recommended as many important crops are polyploid. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04078-8.
Collapse
Affiliation(s)
- Silvia Madritsch
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Tulln, Austria.,Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna, Medical University of Vienna, Vienna, Austria
| | - Agnes Burg
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Tulln, Austria
| | - Eva M Sehr
- AIT Austrian Institute of Technology, Center for Health and Bioresources, Tulln, Austria.
| |
Collapse
|
11
|
Lopes JML, de Matos EM, de Queiroz Nascimento LS, Viccini LF. Validation of reference genes for quantitative gene expression in the Lippia alba polyploid complex (Verbenaceae). Mol Biol Rep 2021; 48:1037-1044. [PMID: 33547533 DOI: 10.1007/s11033-021-06183-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/21/2021] [Indexed: 11/30/2022]
Abstract
Lippia alba (Verbenaceae) is one of the most studied species of the genus Lippia, mainly due to its medicinal properties. The species was described as a polyploid complex with five cytotypes. The comparison of gene expression in species with several ploidal levels needs to be conducted carefully due to possible changes in gene regulation. Quantitative reverse transcription PCR (qRT-PCR) is a widely used method for transcript abundance analyses in plants. Besides being an extremely powerful technique, relative quantification by Real-Time quantitative PCR (RT-qPCR) needs the normalization with a stable reference gene. We evaluated the stability of nine candidate reference genes in Lippia alba with different ploidal levels using NormFinder, geNorm, and RefFinder software. The product of each primer showed a single peak in the melting curve. The R2 value ranged from 0.998 to 1000 and primers efficiency ranged from 98.95% to 129%. The CIT gene came up as a stable housekeeping gene, being appropriate for studies in polyploid accessions of Lippia alba. Considering that polyploidy is widely documented in Angiosperms, the results can be used not only for further gene expression studies in L. alba but also as a possible reference gene for other polyploid complexes. Differential stability among different genes highlights the importance of the validation of reference genes used for RT-qPCR approach in polyploid studies.
Collapse
Affiliation(s)
- Juliana Mainenti Leal Lopes
- Departamento de Biologia, Instituto de Ciências Biológicas, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora, Minas Gerais, 36036-900, Brazil
| | - Elyabe Monteiro de Matos
- Departamento de Biologia, Instituto de Ciências Biológicas, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora, Minas Gerais, 36036-900, Brazil
| | - Laís Stehling de Queiroz Nascimento
- Departamento de Biologia, Instituto de Ciências Biológicas, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora, Minas Gerais, 36036-900, Brazil
| | - Lyderson Facio Viccini
- Departamento de Biologia, Instituto de Ciências Biológicas, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora, Minas Gerais, 36036-900, Brazil.
| |
Collapse
|
12
|
Abstract
RNA-Seq is nowadays an indispensable approach for comparative transcriptome profiling in model and nonmodel organisms. Analyzing RNA-Seq data from nonmodel organisms poses unique challenges, due to unavailability of a high-quality genome reference and to relative sparsity of tools for downstream functional analyses. In this chapter, we provide an overview of the analysis steps in RNA-Seq projects of nonmodel organisms, while elaborating on aspects that are unique to this analysis. These will include (1) strategic decisions that have to be made in advance, regarding sequencing technology and reference to use; (2) how to search for available draft genomes, and, if necessary, how to improve their gene prediction and annotation; (3) how to clean raw reads before de novo assembly; (4) how to separate the reads in RNA-Seq projects of symbiont organisms; (5) how to design and carry out a de novo transcriptome assembly that will be comprehensive and reliable; (6) how to assess transcriptome quality; (7) when and how to reduce redundancy in the transcriptome; (8) techniques and considerations in transcriptome functional annotation; (9) quantitating transcript abundance in the face of high transcriptome redundancy; and, most importantly, (10) how to achieve functional enrichment testing using available tools which either support a large range of species or enable a universal, non-species-specific analysis.Throughout the chapter, we will refer to a variety of useful software tools. For the initial analysis steps involving high-volume data, these will include Linux-based programs. For the later steps, we will describe both Linux and R packages for advanced users, as well as many user-friendly tools for nonprogrammers. Finally, we will present a full workflow for RNA-Seq analysis of nonmodel organisms using the NeatSeq-Flow platform, which can be used locally through a user-friendly interface.
Collapse
Affiliation(s)
- Vered Chalifa-Caspi
- Bioinformatics Core Facility, Ben-Gurion University of the Negev, Beer-Sheva, Israel.
| |
Collapse
|
13
|
Khan Y, Hammarström D, Rønnestad BR, Ellefsen S, Ahmad R. Increased biological relevance of transcriptome analyses in human skeletal muscle using a model-specific pipeline. BMC Bioinformatics 2020; 21:548. [PMID: 33256614 PMCID: PMC7708234 DOI: 10.1186/s12859-020-03866-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 11/09/2020] [Indexed: 12/12/2022] Open
Abstract
Background Human skeletal muscle responds to weight-bearing exercise with significant inter-individual differences. Investigation of transcriptome responses could improve our understanding of this variation. However, this requires bioinformatic pipelines to be established and evaluated in study-specific contexts. Skeletal muscle subjected to mechanical stress, such as through resistance training (RT), accumulates RNA due to increased ribosomal biogenesis. When a fixed amount of total-RNA is used for RNA-seq library preparations, mRNA counts are thus assessed in different amounts of tissue, potentially invalidating subsequent conclusions. The purpose of this study was to establish a bioinformatic pipeline specific for analysis of RNA-seq data from skeletal muscles, to explore the effects of different normalization strategies and to identify genes responding to RT in a volume-dependent manner (moderate vs. low volume). To this end, we analyzed RNA-seq data derived from a twelve-week RT intervention, wherein 25 participants performed both low- and moderate-volume leg RT, allocated to the two legs in a randomized manner. Bilateral muscle biopsies were sampled from m. vastus lateralis before and after the intervention, as well as before and after the fifth training session (Week 2). Result Bioinformatic tools were selected based on read quality, observed gene counts, methodological variation between paired observations, and correlations between mRNA abundance and protein expression of myosin heavy chain family proteins. Different normalization strategies were compared to account for global changes in RNA to tissue ratio. After accounting for the amounts of muscle tissue used in library preparation, global mRNA expression increased by 43–53%. At Week 2, this was accompanied by dose-dependent increases for 21 genes in rested-state muscle, most of which were related to the extracellular matrix. In contrast, at Week 12, no readily explainable dose-dependencies were observed. Instead, traditional normalization and non-normalized models resulted in counterintuitive reverse dose-dependency for many genes. Overall, training led to robust transcriptome changes, with the number of differentially expressed genes ranging from 603 to 5110, varying with time point and normalization strategy. Conclusion Optimized selection of bioinformatic tools increases the biological relevance of transcriptome analyses from resistance-trained skeletal muscle. Moreover, normalization procedures need to account for global changes in rRNA and mRNA abundance.
Collapse
Affiliation(s)
- Yusuf Khan
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317, Hamar, Norway.,Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway
| | - Daniel Hammarström
- Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway.,Swedish School of Sport and Health Sciences, Stockholm, Sweden
| | - Bent R Rønnestad
- Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway
| | - Stian Ellefsen
- Section for Health and Exercise Physiology, Department of Public Health and Sport Sciences, Inland Norway University of Applied Sciences, Lillehammer, Norway.,Innlandet Hospital Trust, Lillehammer, Norway
| | - Rafi Ahmad
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317, Hamar, Norway. .,Faculty of Health Sciences, Institute of Clinical Medicine, UiT - The Arctic University of Norway, Hansine Hansens veg 18, 9019, Tromsø, Norway.
| |
Collapse
|
14
|
Chen LY, Morales-Briones DF, Passow CN, Yang Y. Performance of gene expression analyses using de novo assembled transcripts in polyploid species. Bioinformatics 2020; 35:4314-4320. [PMID: 31400193 DOI: 10.1093/bioinformatics/btz620] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 07/12/2019] [Accepted: 08/09/2019] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Quality of gene expression analyses using de novo assembled transcripts in species that experienced recent polyploidization remains unexplored. RESULTS Differential gene expression (DGE) analyses using putative genes inferred by Trinity, Corset and Grouper performed slightly differently across five plant species that experienced various polyploidy histories. In species that lack recent polyploidy events that occurred in the past several millions of years, DGE analyses using de novo assembled transcriptomes identified 54-82% of the differentially expressed genes recovered by mapping reads to the reference genes. However, in species that experienced more recent polyploidy events, the percentage decreased to 21-65%. Gene co-expression network analyses using de novo assemblies versus mapping to the reference genes recovered the same module that significantly correlated with treatment in one species that lacks recent polyploidization. AVAILABILITY AND IMPLEMENTATION Commands and scripts used in this study are available at https://bitbucket.org/lychen83/chen_et_al_2018_benchmark_dge/; Analysis files are available at Dryad doi: 10.5061/dryad.4p6n481. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ling-Yun Chen
- Department of Plant and Microbial Biology, University of Minnesota, Twin Cities, Saint Paul, MN, USA
| | - Diego F Morales-Briones
- Department of Plant and Microbial Biology, University of Minnesota, Twin Cities, Saint Paul, MN, USA
| | - Courtney N Passow
- Department of Ecology Evolution and Behavior, University of Minnesota, Twin Cities, Saint Paul, MN, USA.,University of Minnesota Genomics Center, University of Minnesota, Twin Cities, Saint Paul, MN, USA
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota, Twin Cities, Saint Paul, MN, USA
| |
Collapse
|
15
|
Escobar-Camacho D, Carleton KL, Narain DW, Pierotti MER. Visual pigment evolution in Characiformes: The dynamic interplay of teleost whole-genome duplication, surviving opsins and spectral tuning. Mol Ecol 2020; 29:2234-2253. [PMID: 32421918 DOI: 10.1111/mec.15474] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 05/09/2020] [Accepted: 05/11/2020] [Indexed: 01/06/2023]
Abstract
Vision represents an excellent model for studying adaptation, given the genotype-to-phenotype map that has been characterized in a number of taxa. Fish possess a diverse range of visual sensitivities and adaptations to underwater light, making them an excellent group to study visual system evolution. In particular, some speciose but understudied lineages can provide a unique opportunity to better understand aspects of visual system evolution such as opsin gene duplication and neofunctionalization. In this study, we showcase the visual system evolution of neotropical Characiformes and the spectral tuning mechanisms they exhibit to modulate their visual sensitivities. Such mechanisms include gene duplications and losses, gene conversion, opsin amino acid sequence and expression variation, and A1 /A2 -chromophore shifts. The Characiforms we studied utilize three cone opsin classes (SWS2, RH2, LWS) and a rod opsin (RH1). However, the characiform's entire opsin gene repertoire is a product of dynamic evolution by opsin gene loss (SWS1, RH2) and duplication (LWS, RH1). The LWS- and RH1-duplicates originated from a teleost specific whole-genome duplication as well as characiform-specific duplication events. Both LWS-opsins exhibit gene conversion and, through substitutions in key tuning sites, one of the LWS-paralogues has acquired spectral sensitivity to green light. These sequence changes suggest reversion and parallel evolution of key tuning sites. Furthermore, characiforms' colour vision is based on the expression of both LWS-paralogues and SWS2. Finally, we found interspecific and intraspecific variation in A1 /A2 -chromophores proportions, correlating with the light environment. These multiple mechanisms may be a result of the diverse visual environments where Characiformes have evolved.
Collapse
Affiliation(s)
| | - Karen L Carleton
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Devika W Narain
- Environmental Sciences, Anton de Kom University of Suriname, Paramaribo, Suriname
| | - Michele E R Pierotti
- Naos Marine Laboratories, Smithsonian Tropical Research Institute, Panama, Republic of Panama
| |
Collapse
|
16
|
The Utility of Genomic and Transcriptomic Data in the Construction of Proxy Protein Sequence Databases for Unsequenced Tree Nuts. BIOLOGY 2020; 9:biology9050104. [PMID: 32438695 PMCID: PMC7284556 DOI: 10.3390/biology9050104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 05/07/2020] [Accepted: 05/12/2020] [Indexed: 01/04/2023]
Abstract
As the apparent incidence of tree nut allergies rises, the development of MS methods that accurately identify tree nuts in food is critical. However, analyses are limited by few available tree nut protein sequences. We assess the utility of translated genomic and transcriptomic data for library construction with Juglans regia, walnut, as a model. Extracted walnuts were subjected to nano-liquid chromatography-mass spectrometry (n-LC-MS/MS), and spectra were searched against databases made from a six-frame translation of the genome (6FT), a transcriptome, and three proteomes. Searches against proteomic databases yielded a variable number of peptides (1156-1275), and only ten additional unique peptides were identified in the 6FT database. Searches against a transcriptomic database yielded results similar to those of the National Center for Biotechnology Information (NCBI) proteome (1200 and 1275 peptides, respectively). Performance of the transcriptomic database was improved via the adjustment of RNA-Seq read processing methods, which increased the number of identified peptides which align to seed allergen proteins by ~20%. Together, these findings establish a path towards the construction of robust proxy protein databases for tree nut species and other non-model organisms.
Collapse
|
17
|
Hu G, Grover CE, Arick MA, Liu M, Peterson DG, Wendel JF. Homoeologous gene expression and co-expression network analyses and evolutionary inference in allopolyploids. Brief Bioinform 2020; 22:1819-1835. [PMID: 32219306 PMCID: PMC7986634 DOI: 10.1093/bib/bbaa035] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 02/06/2020] [Accepted: 02/24/2020] [Indexed: 12/29/2022] Open
Abstract
Polyploidy is a widespread phenomenon throughout eukaryotes. Due to the coexistence of duplicated genomes, polyploids offer unique challenges for estimating gene expression levels, which is essential for understanding the massive and various forms of transcriptomic responses accompanying polyploidy. Although previous studies have explored the bioinformatics of polyploid transcriptomic profiling, the causes and consequences of inaccurate quantification of transcripts from duplicated gene copies have not been addressed. Using transcriptomic data from the cotton genus (Gossypium) as an example, we present an analytical workflow to evaluate a variety of bioinformatic method choices at different stages of RNA-seq analysis, from homoeolog expression quantification to downstream analysis used to infer key phenomena of polyploid expression evolution. In general, EAGLE-RC and GSNAP-PolyCat outperform other quantification pipelines tested, and their derived expression dataset best represents the expected homoeolog expression and co-expression divergence. The performance of co-expression network analysis was less affected by homoeolog quantification than by network construction methods, where weighted networks outperformed binary networks. By examining the extent and consequences of homoeolog read ambiguity, we illuminate the potential artifacts that may affect our understanding of duplicate gene expression, including an overestimation of homoeolog co-regulation and the incorrect inference of subgenome asymmetry in network topology. Taken together, our work points to a set of reasonable practices that we hope are broadly applicable to the evolutionary exploration of polyploids.
Collapse
Affiliation(s)
- Guanjing Hu
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Mark A Arick
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Meiling Liu
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Daniel G Peterson
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
18
|
Payá-Milans M, Olmstead JW, Nunez G, Rinehart TA, Staton M. Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species. Gigascience 2018; 7:5168871. [PMID: 30418578 PMCID: PMC6275443 DOI: 10.1093/gigascience/giy132] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 10/21/2018] [Indexed: 11/12/2022] Open
Abstract
Background The usual analysis of RNA sequencing (RNA-seq) reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-seq analysis workflow on the recovery of useful processed data available for downstream analysis. Findings At the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads could be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality, and completeness of de novo assemblies and with number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP. Conclusions The selection of bioinformatic software tools for RNA-seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.
Collapse
Affiliation(s)
- Miriam Payá-Milans
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| | - James W Olmstead
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Gerardo Nunez
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Timothy A Rinehart
- Thad Cochran Southern Horticultural Laboratory, USDA-Agricultural Research Service, PO Box 287, Poplarville, MS, 39470, United States.,Crop Production and Protection, USDA-Agricultural Research Service, 5601 Sunnyside Ave, Beltsville, MD, 20705, United States
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| |
Collapse
|