1
|
Yu T, Zhao X, Li G. TransMeta simultaneously assembles multisample RNA-seq reads. Genome Res 2022; 32:1398-1407. [PMID: 35858749 PMCID: PMC9341511 DOI: 10.1101/gr.276434.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 06/03/2022] [Indexed: 11/25/2022]
Abstract
Assembling RNA-seq reads into full-length transcripts is crucial in transcriptomic studies and poses computational challenges. Here we present TransMeta, a simple and robust algorithm that simultaneously assembles RNA-seq reads from multiple samples. TransMeta is designed based on the newly introduced vector-weighted splicing graph model, which enables accurate reconstruction of the consensus transcriptome via incorporating a cosine similarity-based combing strategy and a newly designed label-setting path-searching strategy. Tests on both simulated and real data sets show that TransMeta consistently outperforms PsiCLASS, StringTie2 plus its merge mode, and Scallop plus TACO, the most popular tools, in terms of precision and recall under a wide range of coverage thresholds at the meta-assembly level. Additionally, TransMeta consistently shows superior performance at the individual sample level.
Collapse
Affiliation(s)
- Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Xiaoyu Zhao
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
- School of Mathematical Science, Liaocheng University, Liaocheng 252000, China
| |
Collapse
|
2
|
Tavakolian N, Frazão JG, Bendixsen D, Stelkens R, Li CB. Shepherd: Accurate Clustering for Correcting DNA Barcode Errors. Bioinformatics 2022; 38:3710-3716. [PMID: 35708611 PMCID: PMC9344852 DOI: 10.1093/bioinformatics/btac395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 03/26/2022] [Accepted: 06/13/2022] [Indexed: 11/13/2022] Open
Abstract
Motivation DNA barcodes are short, random nucleotide sequences introduced into cell populations to track the relative counts of hundreds of thousands of individual lineages over time. Lineage tracking is widely applied, e.g. to understand evolutionary dynamics in microbial populations and the progression of breast cancer in humans. Barcode sequences are unknown upon insertion and must be identified using next-generation sequencing technology, which is error prone. In this study, we frame the barcode error correction task as a clustering problem with the aim to identify true barcode sequences from noisy sequencing data. We present Shepherd, a novel clustering method that is based on an indexing system of barcode sequences using k-mers, and a Bayesian statistical test incorporating a substitution error rate to distinguish true from error sequences. Results When benchmarking with synthetic data, Shepherd provides barcode count estimates that are significantly more accurate than state-of-the-art methods, producing 10–150 times fewer spurious lineages. For empirical data, Shepherd produces results that are consistent with the improvements seen on synthetic data. These improvements enable higher resolution lineage tracking and more accurate estimates of biologically relevant quantities, e.g. the detection of small effect mutations. Availability and implementation A Python implementation of Shepherd is freely available at: https://www.github.com/Nik-Tavakolian/Shepherd. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nik Tavakolian
- Department of Mathematics, Stockholm University, Stockholm, 10691, Sweden
| | | | - Devin Bendixsen
- Department of Zoology, Stockholm University, Stockholm, 10691, Sweden
| | - Rike Stelkens
- Department of Zoology, Stockholm University, Stockholm, 10691, Sweden
| | - Chun-Biu Li
- Department of Mathematics, Stockholm University, Stockholm, 10691, Sweden
| |
Collapse
|
3
|
Zhao X, Yu T. Tiglon enables accurate transcriptome assembly via integrating mappings of different aligners. iScience 2022; 25:104067. [PMID: 35355524 PMCID: PMC8958329 DOI: 10.1016/j.isci.2022.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 02/09/2022] [Accepted: 03/10/2022] [Indexed: 11/01/2022] Open
Abstract
Full-length transcript reconstruction has a pivotal role in RNA-seq data analysis. In this research, we present a new genome-guided transcriptome assembly algorithm, namely Tiglon, which integrates multiple alignments of different mapping tools and builds the labeled splice graphs, followed by a label-based dynamic path-searching strategy to reconstruct the transcripts. We evaluate Tiglon on a simulated dataset and 12 real datasets under the Hisat2 and Star mappings. The results indicate that the integrating techniques of Tiglon exhibit great superiority over the state-of-the-art assemblers, including StringTie2 and Scallop, depending on Hisat2 alignments, Star alignments, or the merged alignments of both. Especially, Tiglon is significantly powerful in recovering lowly expressed transcripts. Tiglon is designed for integrating multiple alignments to assemble transcripts Integrating alignments of different aligners is helpful for transcriptome assembly Tiglon proposes a new graph model called the labeled splice graph Our experiments demonstrate that Tiglon outperforms the leading assemblers
Collapse
|
4
|
CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure. PLoS Comput Biol 2021; 17:e1009631. [PMID: 34813594 PMCID: PMC8651127 DOI: 10.1371/journal.pcbi.1009631] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 12/07/2021] [Accepted: 11/11/2021] [Indexed: 11/19/2022] Open
Abstract
With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/. Within transcriptome reference sets, non-chimeric sequences are representations of transcribed genes, while artificially generated chimeric ones are mosaics of two or more pieces of DNA incorrectly pieced together. One area where such sets are utilized is in the quantification of gene expression patterns; where RNA-Seq reads are mapped to the sequences within, and subsequent count values reflect expression levels. Artificial chimeras can have a negative impact on count values by erroneously increasing variation in relation to the reads being mapped. Reference sets can be created from de novo assembled contigs, but chimeras can be introduced during the assembly process via the required traversal of graphs, representing gene families, constructed from the RNA-Seq data. Graph complexity determines how likely chimeras will arise. We have created CStone, a de novo assembler that utilizes a classification system to describe such complexity. Contigs created by CStone are labelled in a manner that indicates whether or not they are non-chimeric. This encourages contig dependent results to be presented with increased objectivity by maintaining the context of ambiguity associated with the assembly process. CStone has been tested extensively. Additionally, we have quantified the relationship between chimeras within reference sets and the identification of differentially expressed genes.
Collapse
|
5
|
Metatranscriptomic Analysis of Bacterial Communities on Laundered Textiles: A Pilot Case Study. Microorganisms 2021; 9:microorganisms9081591. [PMID: 34442670 PMCID: PMC8400938 DOI: 10.3390/microorganisms9081591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
Microbially contaminated washing machines and mild laundering conditions facilitate the survival and growth of microorganisms on laundry, promoting undesired side effects such as malodor formation. Clearly, a deeper understanding of the functionality and hygienic relevance of the laundry microbiota necessitates the analysis of the microbial gene expression on textiles after washing, which—to the best of our knowledge—has not been performed before. In this pilot case study, we used single-end RNA sequencing to generate de novo transcriptomes of the bacterial communities remaining on polyester and cotton fabrics washed in a domestic washing machine in mild conditions and subsequently incubated under moist conditions for 72 h. Two common de novo transcriptome assemblers were used. The final assemblies included 22,321 Trinity isoforms and 12,600 Spades isoforms. A large part of these isoforms could be assigned to the SwissProt database, and was further categorized into “molecular function”, “biological process” and “cellular component” using Gene Ontology (GO) terms. In addition, differential gene expression was used to show the difference in the pairwise comparison of the two tissue types. When comparing the assemblies generated with the two assemblers, the annotation results were relatively similar. However, there were clear differences between the de novo assemblies regarding differential gene expression.
Collapse
|
6
|
Heo Y, Manikandan G, Ramachandran A, Chen D. Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
7
|
Spillane JL, LaPolice TM, MacManes MD, Plachetzki DC. Signal, bias, and the role of transcriptome assembly quality in phylogenomic inference. BMC Ecol Evol 2021; 21:43. [PMID: 33726665 PMCID: PMC7968300 DOI: 10.1186/s12862-021-01772-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 03/03/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Phylogenomic approaches have great power to reconstruct evolutionary histories, however they rely on multi-step processes in which each stage has the potential to affect the accuracy of the final result. Many studies have empirically tested and established methodology for resolving robust phylogenies, including selecting appropriate evolutionary models, identifying orthologs, or isolating partitions with strong phylogenetic signal. However, few have investigated errors that may be initiated at earlier stages of the analysis. Biases introduced during the generation of the phylogenomic dataset itself could produce downstream effects on analyses of evolutionary history. Transcriptomes are widely used in phylogenomics studies, though there is little understanding of how a poor-quality assembly of these datasets could impact the accuracy of phylogenomic hypotheses. Here we examined how transcriptome assembly quality affects phylogenomic inferences by creating independent datasets from the same input data representing high-quality and low-quality transcriptome assembly outcomes. RESULTS By studying the performance of phylogenomic datasets derived from alternative high- and low-quality assembly inputs in a controlled experiment, we show that high-quality transcriptomes produce richer phylogenomic datasets with a greater number of unique partitions than low-quality assemblies. High-quality assemblies also give rise to partitions that have lower alignment ambiguity and less compositional bias. In addition, high-quality partitions hold stronger phylogenetic signal than their low-quality transcriptome assembly counterparts in both concatenation- and coalescent-based analyses. CONCLUSIONS Our findings demonstrate the importance of transcriptome assembly quality in phylogenomic analyses and suggest that a portion of the uncertainty observed in such studies could be alleviated at the assembly stage.
Collapse
Affiliation(s)
- Jennifer L Spillane
- Molecular, Cellular, and Biomedical Sciences Department, University of New Hampshire, Durham, NH, 03824, USA.
- Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, 03824, USA.
| | - Troy M LaPolice
- Molecular, Cellular, and Biomedical Sciences Department, University of New Hampshire, Durham, NH, 03824, USA
- Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, 03824, USA
| | - Matthew D MacManes
- Molecular, Cellular, and Biomedical Sciences Department, University of New Hampshire, Durham, NH, 03824, USA
- Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, 03824, USA
| | - David C Plachetzki
- Molecular, Cellular, and Biomedical Sciences Department, University of New Hampshire, Durham, NH, 03824, USA.
- Hubbard Center for Genome Studies, University of New Hampshire, Durham, NH, 03824, USA.
| |
Collapse
|
8
|
Eggertsen M, Tano SA, Chacin DH, Eklöf JS, Larsson J, Berkström C, Buriyo AS, Halling C. Different environmental variables predict distribution and cover of the introduced red seaweed Eucheuma denticulatum in two geographical locations. Biol Invasions 2020. [DOI: 10.1007/s10530-020-02417-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractIn this study we examined abiotic and biotic factors that could potentially influence the presence of a non-indigenous seaweed, Eucheuma denticulatum, in two locations, one outside (Kane’ohe Bay, Hawai’i, USA) and one within (Mafia Island, Tanzania) its natural geographical range. We hypothesized that the availability of hard substrate and the amount of wave exposure would explain distribution patterns, and that higher abundance of herbivorous fishes in Tanzania would exert stronger top–down control than in Hawai’i. To address these hypotheses, we surveyed E. denticulatum in sites subjected to different environmental conditions and used generalized linear mixed models (GLMM) to identify predictors of E. denticulatum presence. We also estimated grazing intensity on E. denticulatum by surveying the type and the amount of grazing scars. Finally, we used molecular tools to distinguish between indigenous and non-indigenous strains of E. denticulatum on Mafia Island. In Kane’ohe Bay, the likelihood of finding E. denticulatum increased with wave exposure, whereas on Mafia Island, the likelihood increased with cover of coral rubble, and decreased with distance from areas of introduction (AOI), but this decrease was less pronounced in the presence of coral rubble. Grazing intensity was higher in Kane’ohe Bay than on Mafia Island. However, we still suggest that efforts to reduce non-indigenous E. denticulatum should include protection of important herbivores in both sites because of the high levels of grazing close to AOI. Moreover, we recommend that areas with hard substrate and high structural complexity should be avoided when farming non-indigenous strains of E. denticulatum.
Collapse
|
9
|
The Utility of Genomic and Transcriptomic Data in the Construction of Proxy Protein Sequence Databases for Unsequenced Tree Nuts. BIOLOGY 2020; 9:biology9050104. [PMID: 32438695 PMCID: PMC7284556 DOI: 10.3390/biology9050104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 05/07/2020] [Accepted: 05/12/2020] [Indexed: 01/04/2023]
Abstract
As the apparent incidence of tree nut allergies rises, the development of MS methods that accurately identify tree nuts in food is critical. However, analyses are limited by few available tree nut protein sequences. We assess the utility of translated genomic and transcriptomic data for library construction with Juglans regia, walnut, as a model. Extracted walnuts were subjected to nano-liquid chromatography-mass spectrometry (n-LC-MS/MS), and spectra were searched against databases made from a six-frame translation of the genome (6FT), a transcriptome, and three proteomes. Searches against proteomic databases yielded a variable number of peptides (1156-1275), and only ten additional unique peptides were identified in the 6FT database. Searches against a transcriptomic database yielded results similar to those of the National Center for Biotechnology Information (NCBI) proteome (1200 and 1275 peptides, respectively). Performance of the transcriptomic database was improved via the adjustment of RNA-Seq read processing methods, which increased the number of identified peptides which align to seed allergen proteins by ~20%. Together, these findings establish a path towards the construction of robust proxy protein databases for tree nut species and other non-model organisms.
Collapse
|
10
|
De novo assembly and functional annotation of the heart + hemolymph transcriptome in the Caribbean spiny lobster Panulirus argus. Mar Genomics 2020; 54:100783. [PMID: 32414680 DOI: 10.1016/j.margen.2020.100783] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 04/29/2020] [Accepted: 05/02/2020] [Indexed: 12/29/2022]
Abstract
The spiny lobster, Panulirus argus, is an ecologically relevant species in shallow water coral reefs and a target of the most lucrative fishery in the greater Caribbean region. This study reports, for the first time, the heart + hemolymph transcriptome of the Caribbean spiny lobster Panulirus argus assembled from short Illumina 150bp PE raw reads. A total of 80,152,094 raw reads were assembled using the Oyster River Protocol pipeline. The assembly resulted in a total of 254,773 transcripts. Functional gene annotation was conducted using the software package 'dammit'. Lastly, gene enrichment analyses were conducted using the Gene Ontology (GO) and KEGG pathway (Kaas) databases. This resource will be of utmost importance in future research aiming at exploring the effect of local and regional anthropogenic disturbances, as well as global climate change on the molecular physiology of this overexploited species.
Collapse
|
11
|
Lachmann A, Clarke DJB, Torre D, Xie Z, Ma'ayan A. Interoperable RNA-Seq analysis in the cloud. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020; 1863:194521. [PMID: 32156561 DOI: 10.1016/j.bbagrm.2020.194521] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 03/01/2020] [Accepted: 03/01/2020] [Indexed: 12/25/2022]
Abstract
RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deployment in a cloud infrastructure composed of interacting microservices. The individual modules cover file transfer into the cloud and APIs to handle the cloud alignment jobs. We next demonstrate how newly generated RNA-Seq data can be placed in the context of thousands of previously published datasets in near real time. With in-depth benchmarks, we identify suitable gene count quantification methods to facilitate cost-effective, accurate, and cloud-based RNA-Seq analysis service. Pseudo-alignment algorithms such as kallisto and Salmon combine high read quality estimation with cost efficient runtime performance. HISAT2 is the fastest of the classical aligners with good alignment quality. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Alexander Lachmann
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA.
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Denis Torre
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| |
Collapse
|
12
|
Williams TL, Senft SL, Yeo J, Martín-Martínez FJ, Kuzirian AM, Martin CA, DiBona CW, Chen CT, Dinneen SR, Nguyen HT, Gomes CM, Rosenthal JJC, MacManes MD, Chu F, Buehler MJ, Hanlon RT, Deravi LF. Dynamic pigmentary and structural coloration within cephalopod chromatophore organs. Nat Commun 2019; 10:1004. [PMID: 30824708 PMCID: PMC6397165 DOI: 10.1038/s41467-019-08891-x] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 01/23/2019] [Indexed: 01/08/2023] Open
Abstract
Chromatophore organs in cephalopod skin are known to produce ultra-fast changes in appearance for camouflage and communication. Light-scattering pigment granules within chromatocytes have been presumed to be the sole source of coloration in these complex organs. We report the discovery of structural coloration emanating in precise register with expanded pigmented chromatocytes. Concurrently, using an annotated squid chromatophore proteome together with microscopy, we identify a likely biochemical component of this reflective coloration as reflectin proteins distributed in sheath cells that envelop each chromatocyte. Additionally, within the chromatocytes, where the pigment resides in nanostructured granules, we find the lens protein Ω- crystallin interfacing tightly with pigment molecules. These findings offer fresh perspectives on the intricate biophotonic interplay between pigmentary and structural coloration elements tightly co-located within the same dynamic flexible organ - a feature that may help inspire the development of new classes of engineered materials that change color and pattern.
Collapse
Affiliation(s)
- Thomas L Williams
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, 02115, USA
| | - Stephen L Senft
- The Eugene Bell Center, The Marine Biological Laboratory, Woods Hole, MA, 02543, USA
| | - Jingjie Yeo
- Department of Biomedical Engineering, Tufts University, Medford, MA, 02155, USA.,Laboratory for Atomistic and Molecular Mechanics (LAMM), Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Institute of High Performance Computing, A*STAR, Singapore, 138632, Singapore
| | - Francisco J Martín-Martínez
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Alan M Kuzirian
- The Eugene Bell Center, The Marine Biological Laboratory, Woods Hole, MA, 02543, USA
| | - Camille A Martin
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, 02115, USA
| | - Christopher W DiBona
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, 02115, USA
| | - Chun-Teh Chen
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sean R Dinneen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, 02115, USA
| | - Hieu T Nguyen
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH, 03824, USA
| | - Conor M Gomes
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, 02115, USA
| | - Joshua J C Rosenthal
- The Eugene Bell Center, The Marine Biological Laboratory, Woods Hole, MA, 02543, USA
| | - Matthew D MacManes
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH, 03824, USA
| | - Feixia Chu
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH, 03824, USA
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Roger T Hanlon
- The Eugene Bell Center, The Marine Biological Laboratory, Woods Hole, MA, 02543, USA.
| | - Leila F Deravi
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA, 02115, USA.
| |
Collapse
|
13
|
MacManes MD. The Oyster River Protocol: a multi-assembler and kmer approach for de novo transcriptome assembly. PeerJ 2018; 6:e5428. [PMID: 30083482 PMCID: PMC6078068 DOI: 10.7717/peerj.5428] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 07/21/2018] [Indexed: 11/24/2022] Open
Abstract
Characterizing transcriptomes in non-model organisms has resulted in a massive increase in our understanding of biological phenomena. This boon, largely made possible via high-throughput sequencing, means that studies of functional, evolutionary, and population genomics are now being done by hundreds or even thousands of labs around the world. For many, these studies begin with a de novo transcriptome assembly, which is a technically complicated process involving several discrete steps. The Oyster River Protocol (ORP), described here, implements a standardized and benchmarked set of bioinformatic processes, resulting in an assembly with enhanced qualities over other standard assembly methods. Specifically, ORP produced assemblies have higher Detonate and TransRate scores and mapping rates, which is largely a product of the fact that it leverages a multi-assembler and kmer assembly process, thereby bypassing the shortcomings of any one approach. These improvements are important, as previously unassembled transcripts are included in ORP assemblies, resulting in a significant enhancement of the power of downstream analysis. Further, as part of this study, I show that assembly quality is unrelated with the number of reads generated, above 30 million reads. Code Availability: The version controlled open-source code is available at https://github.com/macmanes-lab/Oyster_River_Protocol. Instructions for software installation and use, and other details are available at http://oyster-river-protocol.rtfd.org/.
Collapse
Affiliation(s)
- Matthew D MacManes
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA
| |
Collapse
|
14
|
Timmermans MJTN, Thompson MJ, Collins S, Vogler AP. Independent evolution of sexual dimorphism and female-limited mimicry in swallowtail butterflies (Papilio dardanus and Papilio phorcas). Mol Ecol 2017; 26:1273-1284. [PMID: 28100020 DOI: 10.1111/mec.14012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Revised: 12/09/2016] [Accepted: 01/03/2017] [Indexed: 11/29/2022]
Abstract
Several species of swallowtail butterflies (genus Papilio) are Batesian mimics that express multiple mimetic female forms, while the males are monomorphic and nonmimetic. The evolution of such sex-limited mimicry may involve sexual dimorphism arising first and mimicry subsequently. Such a stepwise scenario through a nonmimetic, sexually dimorphic stage has been proposed for two closely related sexually dimorphic species: Papilio phorcas, a nonmimetic species with two female forms, and Papilio dardanus, a female-limited polymorphic mimetic species. Their close relationship indicates that female-limited polymorphism could be a shared derived character of the two species. Here, we present a phylogenomic analysis of the dardanus group using 3964 nuclear loci and whole mitochondrial genomes, showing that they are not sister species and thus that the sexually dimorphic state has arisen independently in the two species. Nonhomology of the female polymorphism in both species is supported by population genetic analysis of engrailed, the presumed mimicry switch locus in P. dardanus. McDonald-Kreitman tests performed on SNPs in engrailed showed the signature of balancing selection in a polymorphic population of P. dardanus, but not in monomorphic populations, nor in the nonmimetic P. phorcas. Hence, the wing polymorphism does not balance polymorphisms in engrailed in P. phorcas. Equally, unlike in P. dardanus, none of the SNPs in P. phorcas engrailed were associated with either female morph. We conclude that sexual dimorphism due to female polymorphism evolved independently in both species from monomorphic, nonmimetic states. While sexual selection may drive male-female dimorphism in nonmimetic species, in mimetic Papilios, natural selection for protection from predators in females is an alternative route to sexual dimorphism.
Collapse
Affiliation(s)
- M J T N Timmermans
- Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK.,Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, SL5 7PY, UK
| | - M J Thompson
- Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK.,Department of Zoology, Cambridge University, Downing Street, Cambridge, CB2 3EJ, UK
| | - S Collins
- ABRI, PO Box 14308, Westlands, 0800, Nairobi, Kenya
| | - A P Vogler
- Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK.,Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot, SL5 7PY, UK
| |
Collapse
|
15
|
Macrander J, Broe M, Daly M. Tissue-Specific Venom Composition and Differential Gene Expression in Sea Anemones. Genome Biol Evol 2016; 8:2358-75. [PMID: 27389690 PMCID: PMC5010892 DOI: 10.1093/gbe/evw155] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2016] [Indexed: 12/19/2022] Open
Abstract
Cnidarians represent one of the few groups of venomous animals that lack a centralized venom transmission system. Instead, they are equipped with stinging capsules collectively known as nematocysts. Nematocysts vary in abundance and type across different tissues; however, the venom composition in most species remains unknown. Depending on the tissue type, the venom composition in sea anemones may be vital for predation, defense, or digestion. Using a tissue-specific RNA-seq approach, we characterize the venom assemblage in the tentacles, mesenterial filaments, and column for three species of sea anemone (Anemonia sulcata, Heteractis crispa, and Megalactis griffithsi). These taxa vary with regard to inferred venom potency, symbiont abundance, and nematocyst diversity. We show that there is significant variation in abundance of toxin-like genes across tissues and species. Although the cumulative toxin abundance for the column was consistently the lowest, contributions to the overall toxin assemblage varied considerably among tissues for different toxin types. Our gene ontology (GO) analyses also show sharp contrasts between conserved GO groups emerging from whole transcriptome analysis and tissue-specific expression among GO groups in our differential expression analysis. This study provides a framework for future characterization of tissue-specific venom and other functionally important genes in this lineage of simple bodied animals.
Collapse
Affiliation(s)
- Jason Macrander
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University
| | - Michael Broe
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University
| | - Marymegan Daly
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University
| |
Collapse
|
16
|
A new transcriptome and transcriptome profiling of adult and larval tissue in the box jellyfish Alatina alata: an emerging model for studying venom, vision and sex. BMC Genomics 2016; 17:650. [PMID: 27535656 PMCID: PMC4989536 DOI: 10.1186/s12864-016-2944-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 07/18/2016] [Indexed: 12/28/2022] Open
Abstract
Background Cubozoans (box jellyfish) are cnidarians that have evolved a number of distinguishing features. Many cubozoans have a particularly potent sting, effected by stinging structures called nematocysts; cubozoans have well-developed light sensation, possessing both image-forming lens eyes and light-sensitive eye spots; and some cubozoans have complex mating behaviors, including aggregations, copulation and internal fertilization. The cubozoan Alatina alata is emerging as a cnidarian model because it forms predictable monthly nearshore breeding aggregations in tropical to subtropical waters worldwide, making both adult and larval material reliably accessible. To develop resources for A. alata, this study generated a functionally annotated transcriptome of adult and larval tissue, applying preliminary differential expression analyses to identify candidate genes involved in nematogenesis and venom production, vision and extraocular sensory perception, and sexual reproduction, which for brevity we refer to as “venom”, “vision” and “sex”. Results We assembled a transcriptome de novo from RNA-Seq data pooled from multiple body parts (gastric cirri, ovaries, tentacle (with pedalium base) and rhopalium) of an adult female A. alata medusa and larval planulae. Our transcriptome comprises ~32 K transcripts, after filtering, and provides a basis for analyzing patterns of gene expression in adult and larval box jellyfish tissues. Furthermore, we annotated a large set of candidate genes putatively involved in venom, vision and sex, providing an initial molecular characterization of these complex features in cubozoans. Expression profiles and gene tree reconstruction provided a number of preliminary insights into the putative sites of nematogenesis and venom production, regions of phototransduction activity and fertilization dynamics in A. alata. Conclusions Our Alatina alata transcriptome significantly adds to the genomic resources for this emerging cubozoan model. This study provides the first annotated transcriptome from multiple tissues of a cubozoan focusing on both the adult and larvae. Our approach of using multiple body parts and life stages to generate this transcriptome effectively identified a broad range of candidate genes for the further study of coordinated processes associated with venom, vision and sex. This new genomic resource and the candidate gene dataset are valuable for further investigating the evolution of distinctive features of cubozoans, and of cnidarians more broadly. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2944-3) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
|
18
|
Priyam M, Tripathy M, Rai U, Ghorai SM. Tracing the evolutionary lineage of pattern recognition receptor homologues in vertebrates: An insight into reptilian immunity via de novo sequencing of the wall lizard splenic transcriptome. Vet Immunol Immunopathol 2016; 172:26-37. [DOI: 10.1016/j.vetimm.2016.03.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 03/01/2016] [Accepted: 03/02/2016] [Indexed: 10/22/2022]
|
19
|
Vincent AT, Derome N, Boyle B, Culley AI, Charette SJ. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. J Microbiol Methods 2016; 138:60-71. [PMID: 26995332 DOI: 10.1016/j.mimet.2016.02.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 01/26/2016] [Accepted: 02/24/2016] [Indexed: 12/16/2022]
Abstract
The Sanger sequencing method produces relatively long DNA sequences of unmatched quality and has been considered for long time as the gold standard for sequencing DNA. Many improvements of the Sanger method that culminated with fluorescent dyes coupled with automated capillary electrophoresis enabled the sequencing of the first genomes. Nevertheless, using this technology to sequence whole genomes was costly, laborious and time consuming even for genomes that are relatively small in size. A major technological advance was the introduction of next-generation sequencing (NGS) pioneered by 454 Life Sciences in the early part of the 21th century. NGS allowed scientists to sequence thousands to millions of DNA molecules in a single machine run. Since then, new NGS technologies have emerged and existing NGS platforms have been improved, enabling the production of genome sequences at an unprecedented rate as well as broadening the spectrum of NGS applications. The current affordability of generating genomic information, especially with microbial samples, has resulted in a false sense of simplicity that belies the fact that many researchers still consider these technologies a black box. In this review, our objective is to identify and discuss four steps that we consider crucial to the success of any NGS-related project. These steps are: (1) the definition of the research objectives beyond sequencing and appropriate experimental planning, (2) library preparation, (3) sequencing and (4) data analysis. The goal of this review is to give an overview of the process, from sample to analysis, and discuss how to optimize your resources to achieve the most from your NGS-based research. Regardless of the evolution and improvement of the sequencing technologies, these four steps will remain relevant.
Collapse
Affiliation(s)
- Antony T Vincent
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, QC G1V 4G5, Canada
| | - Nicolas Derome
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biologie, Faculté des sciences et de génie, Université Laval, Quebec City G1V 0A6, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Alexander I Culley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Groupe de Recherche en Écologie Buccale (GREB), Faculté de médecine dentaire, Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Steve J Charette
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, QC G1V 4G5, Canada.
| |
Collapse
|
20
|
Richardson MF, Sherman CDH. De Novo Assembly and Characterization of the Invasive Northern Pacific Seastar Transcriptome. PLoS One 2015; 10:e0142003. [PMID: 26529321 PMCID: PMC4631335 DOI: 10.1371/journal.pone.0142003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 10/15/2015] [Indexed: 12/11/2022] Open
Abstract
Invasive species are a major threat to global biodiversity but can also serve as valuable model systems to examine important evolutionary processes. While the ecological aspects of invasions have been well documented, the genetic basis of adaptive change during the invasion process has been hampered by a lack of genomic resources for the majority of invasive species. Here we report the first larval transcriptomic resource for the Northern Pacific Seastar, Asterias amurensis, an invasive marine predator in Australia. Approximately 117.5 million 100 base-pair (bp) paired-end reads were sequenced from a single RNA-Seq library from a pooled set of full-sibling A. amurensis bipinnaria larvae. We evaluated the efficacy of a pre-assembly error correction pipeline on subsequent de novo assembly. Error correction resulted in small but important improvements to the final assembly in terms of mapping statistics and core eukaryotic genes representation. The error-corrected de novo assembly resulted in 115,654 contigs after redundancy clustering. 41,667 assembled contigs were homologous to sequences from NCBI’s non-redundant protein and UniProt databases. We assigned Gene Ontology, KEGG Orthology, Pfam protein domain terms and predicted protein-coding sequences to > 36,000 contigs. The final transcriptome dataset generated here provides functional information for 18,319 unique proteins, comprising at least 11,355 expressed genes. Furthermore, we identified 9,739 orthologs to P. miniata proteins, evaluated our annotation pipeline and generated a list of 150 candidate genes for responses to several environmental stressors that may be important for adaptation of A. amurensis in the invasive range. Our study has produced a large set of A. amurensis RNA contigs with functional annotations that can serve as a resource for future comparisons to other echinoderm transcriptomes and gene expression studies. Our data can be used to study the genetic basis of adaptive change and other important evolutionary processes during a successful invasion.
Collapse
Affiliation(s)
- Mark F. Richardson
- Deakin University, Geelong, Australia. School of Life and Environmental Sciences, Centre for Integrative Ecology, (Waurn Ponds Campus). 75 Pigdons Road. Locked Bag 20000, Geelong, VIC 3220, Australia
- * E-mail:
| | - Craig D. H. Sherman
- Deakin University, Geelong, Australia. School of Life and Environmental Sciences, Centre for Integrative Ecology, (Waurn Ponds Campus). 75 Pigdons Road. Locked Bag 20000, Geelong, VIC 3220, Australia
| |
Collapse
|
21
|
De Wit P, Pespeni MH, Palumbi SR. SNP genotyping and population genomics from expressed sequences - current advances and future possibilities. Mol Ecol 2015; 24:2310-23. [DOI: 10.1111/mec.13165] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/13/2015] [Accepted: 03/18/2015] [Indexed: 02/01/2023]
Affiliation(s)
- Pierre De Wit
- Department of Biology and Environmental Sciences; University of Gothenburg; Sven Lovén Centre for Marine Science - Tjärnö; Hättebäcksvägen 7 Strömstad SE-452 96 Sweden
| | - Melissa H. Pespeni
- Department of Biology; University of Vermont; Marsh Life Science; Rm 326A 109 Carrigan Drive Burlington VT 05405 USA
| | - Stephen R. Palumbi
- Department of Biology; Stanford University; Hopkins Marine Station 120 Ocean view Blvd. Pacific Grove CA 93950 USA
| |
Collapse
|
22
|
Paschold A, Larson NB, Marcon C, Schnable JC, Yeh CT, Lanz C, Nettleton D, Piepho HP, Schnable PS, Hochholdinger F. Nonsyntenic genes drive highly dynamic complementation of gene expression in maize hybrids. THE PLANT CELL 2014; 26:3939-48. [PMID: 25315323 PMCID: PMC4247586 DOI: 10.1105/tpc.114.130948] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Revised: 09/05/2014] [Accepted: 09/24/2014] [Indexed: 05/19/2023]
Abstract
Maize (Zea mays) displays an exceptional level of structural genomic diversity, which is likely unique among higher eukaryotes. In this study, we surveyed how the genetic divergence of two maize inbred lines affects the transcriptomic landscape in four different primary root tissues of their F1-hybrid progeny. An extreme instance of complementation was frequently observed: genes that were expressed in only one parent but in both reciprocal hybrids. This single-parent expression (SPE) pattern was detected for 2341 genes with up to 1287 SPE patterns per tissue. As a consequence, the number of active genes in hybrids exceeded that of their parents in each tissue by >400. SPE patterns are highly dynamic, as illustrated by their excessive degree of tissue specificity (80%). The biological significance of this type of complementation is underpinned by the observation that a disproportionally high number of SPE genes (75 to 82%) is nonsyntenic, as opposed to all expressed genes (36%). These genes likely evolved after the last whole-genome duplication and are therefore younger than the syntenic genes. In summary, SPE genes shape the remarkable gene expression plasticity between root tissues and complementation in maize hybrids, resulting in a tissue-specific increase of active genes in F1-hybrids compared with their inbred parents.
Collapse
Affiliation(s)
- Anja Paschold
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| | - Nick B Larson
- Department of Statistics, Iowa State University, Ames, Iowa 50011-1210
| | - Caroline Marcon
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| | - James C Schnable
- Department of Agronomy and Horticulture, University of Nebraska, Lincoln, Nebraska 68588
| | - Cheng-Ting Yeh
- Department of Agronomy and Center for Plant Genomics, Iowa State University, Ames, Iowa 50011-3650
| | - Christa Lanz
- Department of Molecular Biology, Max-Planck-Institute for Developmental Biology, 72076 Tuebingen, Germany
| | - Dan Nettleton
- Department of Statistics, Iowa State University, Ames, Iowa 50011-1210
| | - Hans-Peter Piepho
- Institute for Crop Science, Biostatistics Unit, University of Hohenheim, 70599 Stuttgart, Germany
| | - Patrick S Schnable
- Department of Agronomy and Center for Plant Genomics, Iowa State University, Ames, Iowa 50011-3650
| | - Frank Hochholdinger
- Institute of Crop Science and Resource Conservation, Crop Functional Genomics, University of Bonn, 53113 Bonn, Germany
| |
Collapse
|
23
|
Harris SE, O'Neill RJ, Munshi-South J. Transcriptome resources for the white-footed mouse (Peromyscus leucopus): new genomic tools for investigating ecologically divergent urban and rural populations. Mol Ecol Resour 2014; 15:382-94. [PMID: 24980186 DOI: 10.1111/1755-0998.12301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Revised: 06/26/2014] [Accepted: 06/27/2014] [Indexed: 12/30/2022]
Abstract
Genomic resources are important and attainable for examining evolutionary change in divergent natural populations of nonmodel species. We utilized two next-generation sequencing (NGS) platforms, 454 and SOLiD 5500XL, to assemble low-coverage transcriptomes of the white-footed mouse (Peromyscus leucopus), a widespread and abundant native rodent in eastern North America. We sequenced liver mRNA transcripts from multiple individuals collected from urban populations in New York City and rural populations in undisturbed protected areas nearby and assembled a reference transcriptome using 1 080 065 954 SOLiD 5500XL (75 bp) reads and 3 052 640 454 GS FLX + reads. The reference contained 40 908 contigs with a N50 = 1044 bp and a total content of 30.06 Megabases (Mb). Contigs were annotated from Mus musculus (39.96% annotated) Uniprot databases. We identified 104 655 high-quality single nucleotide polymorphisms (SNPs) and 65 single sequence repeats (SSRs) with flanking primers. We also used normalized read counts to identify putative gene expression differences in 10 genes between populations. There were 19 contigs significantly differentially expressed in urban populations compared to rural populations, with gene function annotations generally related to the translation and modification of proteins and those involved in immune responses. The individual transcriptomes generated in this study will be used to investigate evolutionary responses to urbanization. The reference transcriptome provides a valuable resource for the scientific community using North American Peromyscus species as emerging model systems for ecological genetics and adaptation.
Collapse
Affiliation(s)
- Stephen E Harris
- Program in Ecology, Evolutionary Biology, & Behavior, The Graduate Center, City University of New York (CUNY), New York, NY, 10016, USA
| | | | | |
Collapse
|
24
|
Chang Z, Wang Z, Li G. The impacts of read length and transcriptome complexity for de novo assembly: a simulation study. PLoS One 2014; 9:e94825. [PMID: 24736633 PMCID: PMC3988101 DOI: 10.1371/journal.pone.0094825] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Accepted: 03/19/2014] [Indexed: 11/22/2022] Open
Abstract
Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of de novo assembly decreases sharply with the increase of transcriptome complexity, all existing de novo assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events.
Collapse
Affiliation(s)
- Zheng Chang
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Zhenjia Wang
- School of Mathematics, Shandong University, Jinan, Shandong, China
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, Shandong, China
- * E-mail:
| |
Collapse
|
25
|
Wang C, Grohme MA, Mali B, Schill RO, Frohme M. Towards decrypting cryptobiosis--analyzing anhydrobiosis in the tardigrade Milnesium tardigradum using transcriptome sequencing. PLoS One 2014; 9:e92663. [PMID: 24651535 PMCID: PMC3961413 DOI: 10.1371/journal.pone.0092663] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2013] [Accepted: 02/25/2014] [Indexed: 11/18/2022] Open
Abstract
Background Many tardigrade species are capable of anhydrobiosis; however, mechanisms underlying their extreme desiccation resistance remain elusive. This study attempts to quantify the anhydrobiotic transcriptome of the limno-terrestrial tardigrade Milnesium tardigradum. Results A prerequisite for differential gene expression analysis was the generation of a reference hybrid transcriptome atlas by assembly of Sanger, 454 and Illumina sequence data. The final assembly yielded 79,064 contigs (>100 bp) after removal of ribosomal RNAs. Around 50% of them could be annotated by SwissProt and NCBI non-redundant protein sequences. Analysis using CEGMA predicted 232 (93.5%) out of the 248 highly conserved eukaryotic genes in the assembly. We used this reference transcriptome for mapping and quantifying the expression of transcripts regulated under anhdydrobiosis in a time-series during dehydration and rehydration. 834 of the transcripts were found to be differentially expressed in a single stage (dehydration/inactive tun/rehydration) and 184 were overlapping in two stages while 74 were differentially expressed in all three stages. We have found interesting patterns of differentially expressed transcripts that are in concordance with a common hypothesis of metabolic shutdown during anhydrobiosis. This included down-regulation of several proteins of the DNA replication and translational machinery and protein degradation. Among others, heat shock proteins Hsp27 and Hsp30c were up-regulated in response to dehydration and rehydration. In addition, we observed up-regulation of ployubiquitin-B upon rehydration together with a higher expression level of several DNA repair proteins during rehydration than in the dehydration stage. Conclusions Most of the transcripts identified to be differentially expressed had distinct cellular function. Our data suggest a concerted molecular adaptation in M. tardigradum that permits extreme forms of ametabolic states such as anhydrobiosis. It is temping to surmise that the desiccation tolerance of tradigrades can be achieved by a constitutive cellular protection system, probably in conjunction with other mechanisms such as rehydration-induced cellular repair.
Collapse
Affiliation(s)
- Chong Wang
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
- * E-mail:
| | - Markus A. Grohme
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| | - Brahim Mali
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| | - Ralph O. Schill
- Biological Institute, Zoology, University of Stuttgart, Stuttgart, Germany
| | - Marcus Frohme
- Molecular Biotechnology and Functional Genomics, Technical University of Applied Sciences Wildau, Wildau, Germany
| |
Collapse
|
26
|
Zhou X, Rokas A. Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Mol Ecol 2014; 23:1679-700. [DOI: 10.1111/mec.12680] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2013] [Revised: 01/17/2014] [Accepted: 01/22/2014] [Indexed: 12/17/2022]
Affiliation(s)
- Xiaofan Zhou
- Department of Biological Sciences; Vanderbilt University; Nashville TN 37235 USA
| | - Antonis Rokas
- Department of Biological Sciences; Vanderbilt University; Nashville TN 37235 USA
| |
Collapse
|
27
|
Mbandi SK, Hesse U, Rees DJG, Christoffels A. A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads. Front Genet 2014; 5:17. [PMID: 24575122 PMCID: PMC3921913 DOI: 10.3389/fgene.2014.00017] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Accepted: 01/19/2014] [Indexed: 11/13/2022] Open
Abstract
Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using quality scores. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina, we teased out the effects of quality score based filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enable selection of the optimum de novo transcriptome assembly in non-model organisms.
Collapse
Affiliation(s)
- Stanley Kimbung Mbandi
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape Bellville, South Africa
| | - Uljana Hesse
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape Bellville, South Africa
| | - D Jasper G Rees
- Biotechnology Platform, Agricultural Research Council Onderstepoort, South Africa
| | - Alan Christoffels
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape Bellville, South Africa
| |
Collapse
|
28
|
Macmanes MD. On the optimal trimming of high-throughput mRNA sequence data. Front Genet 2014; 5:13. [PMID: 24567737 PMCID: PMC3908319 DOI: 10.3389/fgene.2014.00013] [Citation(s) in RCA: 123] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 01/14/2014] [Indexed: 01/19/2023] Open
Abstract
The widespread and rapid adoption of high-throughput sequencing technologies has afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change, and perhaps more importantly, the links between genotype and phenotype. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score <2 or <5, is optimal for most studies across a wide variety of metrics.
Collapse
Affiliation(s)
- Matthew D Macmanes
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire Durham, NH, USA ; Hubbard Center for Genome Studies Durham, NH, USA
| |
Collapse
|
29
|
Alamancos GP, Agirre E, Eyras E. Methods to study splicing from high-throughput RNA sequencing data. Methods Mol Biol 2014; 1126:357-97. [PMID: 24549677 DOI: 10.1007/978-1-62703-980-2_26] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data, which could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.
Collapse
Affiliation(s)
- Gael P Alamancos
- Computational Genomics, Universitat Pompeu Fabra, Barcelona, Spain
| | | | | |
Collapse
|