1
|
Xia LC, Ai D, Lee H, Andor N, Li C, Zhang NR, Ji HP. SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. Gigascience 2018; 7:5049476. [PMID: 29982625 PMCID: PMC6057526 DOI: 10.1093/gigascience/giy081] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 05/22/2018] [Accepted: 06/26/2018] [Indexed: 11/29/2022] Open
Abstract
Background Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes. Findings We developed SVEngine, an open-source tool to address this need. SVEngine simulates next-generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file, and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs), and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions, and translocations. Finally, SVEngine simulates sequence data that replicate the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time. Conclusions We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogeneous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift, and neighboring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use .
Collapse
Affiliation(s)
- Li Charlie Xia
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
- Department of Statistics, the Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 18014
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083 P. R. China
| | - Hojoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
| | - Noemi Andor
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
| | - Chao Li
- School of Mathematics and Physics, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083 P. R. China
| | - Nancy R Zhang
- Department of Statistics, the Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 18014
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, 269 Campus Drive, Stanford, CA 94305
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304
| |
Collapse
|
2
|
Yan H, Cai H, Guan Q, He J, Zhang J, Guo Y, Huang H, Li X, Li Y, Gu Y, Qi L, Guo Z. Individualized analysis of differentially expressed miRNAs with application to the identification of miRNAs deregulated commonly in lung cancer tissues. Brief Bioinform 2017; 19:793-802. [DOI: 10.1093/bib/bbx015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Indexed: 01/10/2023] Open
Affiliation(s)
- Haidan Yan
- Department of Bioinformatics, Fujian Medical University, China
| | - Hao Cai
- Department of Bioinformatics, Fujian Medical University, China
| | - Qingzhou Guan
- Department of Bioinformatics, Fujian Medical University, China
| | - Jun He
- Department of Bioinformatics, Fujian Medical University, China
| | - Juan Zhang
- Department of Bioinformatics, Fujian Medical University, China
| | - You Guo
- Department of Preventive Medicine, Gannan Medical University, China
| | - Haiyan Huang
- Department of Bioinformatics, Fujian Medical University, China
| | - Xiangyu Li
- Department of Bioinformatics, Fujian Medical University, China
| | - Yawei Li
- Department of Bioinformatics, Fujian Medical University, China
| | - Yunyan Gu
- Department of Bioinformatics, Harbin Medical University, China
| | - Lishuang Qi
- Department of Bioinformatics, Fujian Medical University, China
| | - Zheng Guo
- Department of Bioinformatics, Fujian Medical University, China
- Department of Bioinformatics, Harbin Medical University, China
| |
Collapse
|
3
|
Puri KD, Yan C, Leng Y, Zhong S. RNA-Seq Revealed Differences in Transcriptomes between 3ADON and 15ADON Populations of Fusarium graminearum In Vitro and In Planta. PLoS One 2016; 11:e0163803. [PMID: 27788144 PMCID: PMC5082872 DOI: 10.1371/journal.pone.0163803] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 09/14/2016] [Indexed: 01/24/2023] Open
Abstract
Fusarium graminearum is the major causal agent of Fusarium head blight (FHB) in barley and wheat in North America. The fungus not only causes yield loss of the crops but also produces harmful trichothecene mycotoxins [Deoxynivalenol (DON) and its derivatives-3-acetyldeoxynivalenol (3ADON) and 15-acetyldeoxynivalenol (15ADON), and nivalenol (NIV)] that contaminate grains. Previous studies showed a dramatic increase of 3ADON-producing isolates with higher aggressiveness and DON production than the 15ADON-producing isolates in North America. However, the genetic and molecular basis of differences between the two types of isolates is unclear. In this study, we compared transcriptomes of the 3ADON and 15ADON isolates in vitro (in culture media) and in planta (during infection on the susceptible wheat cultivar 'Briggs') using RNA-sequencing. The in vitro gene expression comparison identified 479 up-regulated and 801 down-regulated genes in the 3ADON isolates; the up-regulated genes were mainly involved in C-compound and carbohydrate metabolism (18.6%), polysaccharide metabolism (7.7%) or were of unknown functions (57.6%). The in planta gene expression analysis revealed that 185, 89, and 62 genes were up-regulated in the 3ADON population at 48, 96, and 144 hours after inoculation (HAI), respectively. The up-regulated genes were significantly enriched in functions for cellular import, C-compound and carbohydrate metabolism, allantoin and allantoate transport at 48 HAI, for detoxification and virulence at 96 HAI, and for metabolism of acetic acid derivatives, detoxification, and cellular import at 144 HAI. Comparative analyses of in planta versus in vitro gene expression further revealed 2,159, 1,981 and 2,095 genes up-regulated in the 3ADON isolates, and 2,415, 2,059 and 1,777 genes up-regulated in the 15ADON isolates at the three time points after inoculation. Collectively, our data provides a foundation for further understanding of molecular mechanisms involved in aggressiveness and DON production of the two chemotype isolates of F. graminearum.
Collapse
Affiliation(s)
- Krishna D. Puri
- Department of Plant Pathology, North Dakota State University, Fargo, ND, United States of America
| | - Changhui Yan
- Department of Computer Science, North Dakota State University, Fargo, ND, United States of America
| | - Yueqiang Leng
- Department of Plant Pathology, North Dakota State University, Fargo, ND, United States of America
| | - Shaobin Zhong
- Department of Plant Pathology, North Dakota State University, Fargo, ND, United States of America
| |
Collapse
|
4
|
Tarazona S, Furió-Tarí P, Turrà D, Pietro AD, Nueda MJ, Ferrer A, Conesa A. Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res 2015; 43:e140. [PMID: 26184878 PMCID: PMC4666377 DOI: 10.1093/nar/gkv711] [Citation(s) in RCA: 361] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 07/01/2015] [Indexed: 12/14/2022] Open
Abstract
As the use of RNA-seq has popularized, there is an increasing consciousness of the importance of experimental design, bias removal, accurate quantification and control of false positives for proper data analysis. We introduce the NOISeq R-package for quality control and analysis of count data. We show how the available diagnostic tools can be used to monitor quality issues, make pre-processing decisions and improve analysis. We demonstrate that the non-parametric NOISeqBIO efficiently controls false discoveries in experiments with biological replication and outperforms state-of-the-art methods. NOISeq is a comprehensive resource that meets current needs for robust data-aware analysis of RNA-seq differential expression.
Collapse
Affiliation(s)
- Sonia Tarazona
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Eduardo Primo Yúfera 3, 46012, Valencia, Spain Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, Camí de Vera, 46022, Valencia, Spain
| | - Pedro Furió-Tarí
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Eduardo Primo Yúfera 3, 46012, Valencia, Spain
| | - David Turrà
- Department of Genetics, Universidad de Córdoba, Campus de Rabanales Edificio Gregor Mendel, 14071, Córdoba, Spain
| | - Antonio Di Pietro
- Department of Genetics, Universidad de Córdoba, Campus de Rabanales Edificio Gregor Mendel, 14071, Córdoba, Spain
| | - María José Nueda
- Statistics and Operational Research Department, Universidad de Alicante, Carretera San Vicente del Raspeig s/n, 03690, Alicante, Spain
| | - Alberto Ferrer
- Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, Camí de Vera, 46022, Valencia, Spain
| | - Ana Conesa
- Genomics of Gene Expression Lab, Centro de Investigación Príncipe Felipe, Eduardo Primo Yúfera 3, 46012, Valencia, Spain Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL 32603, USA
| |
Collapse
|
5
|
Williams AG, Thomas S, Wyman SK, Holloway AK. RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis. ACTA ACUST UNITED AC 2014; 83:11.13.1-20. [PMID: 25271838 DOI: 10.1002/0471142905.hg1113s83] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development.
Collapse
|
6
|
Mason CE, Porter SG, Smith TM. Characterizing multi-omic data in systems biology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:15-38. [PMID: 24292960 DOI: 10.1007/978-1-4614-8778-4_2] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In today's biology, studies have shifted to analyzing systems over discrete biochemical reactions and pathways. These studies depend on combining the results from scores of experimental methods that analyze DNA; mRNA; noncoding RNAs, DNA, RNA, and protein interactions; and the nucleotide modifications that form the epigenome into global datasets that represent a diverse array of "omics" data (transcriptional, epigenetic, proteomic, metabolomic). The methods used to collect these data consist of high-throughput data generation platforms that include high-content screening, imaging, flow cytometry, mass spectrometry, and nucleic acid sequencing. Of these, the next-generation DNA sequencing platforms predominate because they provide an inexpensive and scalable way to quickly interrogate the molecular changes at the genetic, epigenetic, and transcriptional level. Furthermore, existing and developing single-molecule sequencing platforms will likely make direct RNA and protein measurements possible, thus increasing the specificity of current assays and making it possible to better characterize "epi-alterations" that occur in the epigenome and epitranscriptome. These diverse data types present us with the largest challenge: how do we develop software systems and algorithms that can integrate these datasets and begin to support a more democratic model where individuals can capture and track their own medical information through biometric devices and personal genome sequencing? Such systems will need to provide the necessary user interactions to work with the trillions of data points needed to make scientific discoveries. Here, we describe novel approaches in the genesis and processing of such data, models to integrate these data, and the increasing ubiquity of self-reporting and self-measured genomics and health data.
Collapse
Affiliation(s)
- Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, USA,
| | | | | |
Collapse
|
7
|
Lucas Lledó JI, Cáceres M. On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing. PLoS One 2013; 8:e61292. [PMID: 23637806 PMCID: PMC3634047 DOI: 10.1371/journal.pone.0061292] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 03/07/2013] [Indexed: 12/15/2022] Open
Abstract
One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, ≥% of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions--SVDetect, GRIAL, and VariationHunter--, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects.
Collapse
Affiliation(s)
- José Ignacio Lucas Lledó
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| |
Collapse
|
8
|
Abstract
Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
Collapse
Affiliation(s)
- Niranjan Nagarajan
- Computational and Systems Biology, Genome Institute of Singapore, 138672 Singapore
| | | |
Collapse
|
9
|
Robles JA, Qureshi SE, Stephen SJ, Wilson SR, Burden CJ, Taylor JM. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing. BMC Genomics 2012; 13:484. [PMID: 22985019 PMCID: PMC3560154 DOI: 10.1186/1471-2164-13-484] [Citation(s) in RCA: 136] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 08/10/2012] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. RESULTS Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. CONCLUSIONS This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Collapse
Affiliation(s)
- José A Robles
- CSIRO Plant Industry, Black Mountain Laboratories, Canberra, Australia
| | | | | | | | | | | |
Collapse
|
10
|
Aldridge S, Hadfield J. Introduction to miRNA profiling technologies and cross-platform comparison. Methods Mol Biol 2012; 822:19-31. [PMID: 22144189 DOI: 10.1007/978-1-61779-427-8_2] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
MicroRNA analysis has been widely adopted for basic and applied science. The tools and technologies available for quantifying and analysing miRNAs are still maturing. Here, we give an introductory overview of the main tools and the challenges in their use. We also discuss the importance of basic experimental design, sample handling and analysis methods as the impact of these can be as profound as the choice of miRNA analysis platform. Whether the reader is interested in a gene-by-gene or genome-wide approach choosing the platform to use is not trivial. Careful thought given before starting an experiment will make the execution much easier.
Collapse
|
11
|
Liang L, Tan X, Juarez S, Villaverde H, Pablo J, Nakajima-Sasaki R, Gotuzzo E, Saito M, Hermanson G, Molina D, Felgner S, Morrow WJW, Liang X, Gilman RH, Davies DH, Tsolis RM, Vinetz JM, Felgner PL. Systems biology approach predicts antibody signature associated with Brucella melitensis infection in humans. J Proteome Res 2011; 10:4813-24. [PMID: 21863892 PMCID: PMC3189706 DOI: 10.1021/pr200619r] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
A complete understanding of the factors that determine selection of antigens recognized by the humoral immune response following infectious agent challenge is lacking. Here we illustrate a systems biology approach to identify the antibody signature associated with Brucella melitensis (Bm) infection in humans and predict proteomic features of serodiagnostic antigens. By taking advantage of a full proteome microarray expressing previously cloned 1406 and newly cloned 1640 Bm genes, we were able to identify 122 immunodominant antigens and 33 serodiagnostic antigens. The reactive antigens were then classified according to annotated functional features (COGs), computationally predicted features (e.g., subcellular localization, physical properties), and protein expression estimated by mass spectrometry (MS). Enrichment analyses indicated that membrane association and secretion were significant enriching features of the reactive antigens, as were proteins predicted to have a signal peptide, a single transmembrane domain, and outer membrane or periplasmic location. These features accounted for 67% of the serodiagnostic antigens. An overlay of the seroreactive antigen set with proteomic data sets generated by MS identified an additional 24%, suggesting that protein expression in bacteria is an additional determinant in the induction of Brucella-specific antibodies. This analysis indicates that one-third of the proteome contains enriching features that account for 91% of the antigens recognized, and after B. melitensis infection the immune system develops significant antibody titers against 10% of the proteins with these enriching features. This systems biology approach provides an empirical basis for understanding the breadth and specificity of the immune response to B. melitensis and a new framework for comparing the humoral responses against other microorganisms.
Collapse
Affiliation(s)
- Li Liang
- Department of Medicine, Division of Infectious Diseases, University of California, Irvine, California 92697, United States
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
New approaches to Prunus transcriptome analysis. Genetica 2011; 139:755-69. [PMID: 21584650 DOI: 10.1007/s10709-011-9580-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Accepted: 04/26/2011] [Indexed: 12/11/2022]
Abstract
The recent sequencing of the complete genome of the peach offers new opportunities for further transcriptomic studies in Prunus species in the called post-genomics era. First works on transcriptome analysis in Prunus species started in the early 2000s with the development of ESTs (expressed sequence tags) and the analysis of several candidate genes. Later, new strategies of massive analysis (high throughput) of transcriptomes have been applied, producing larger amounts of data in terms of expression of a large number of genes in a single experiment. One of these systems is massive transcriptome analysis using cDNA biochips (microarrays) to analyze thousands of genes by hybridization of mRNA labelled with fluorescence. However, the recent emergence of a massive sequencing methodology ("deep-sequencing") of the transcriptome (RNA-Seq), based on lowering the costs of DNA (in this cases complementary, cDNA) sequencing, could be more suitable than the application of microarrays. Recent papers have described the tremendous power of this technology, both in terms of profiling coverage and quantitative accuracy in transcriptomic studies. Now this technology is being applied to plant species, including Prunus. In this work, we analyze the potential in using this RNA-Seq technology in the study of Prunus transcriptomes and the development of genomic tools. In addition, the strengths and limitations of RNA-Seq relative to microarray profiling have been discussed.
Collapse
|
13
|
Abstract
Filamentous fungi have a high-capacity secretory system and are therefore widely exploited for the industrial production of native and heterologous proteins. However, in most cases, the yields of nonfungal proteins are significantly lower than those obtained for fungal proteins. One well-studied bottleneck appears to be the result of slow or aberrant folding of heterologous proteins in the ER during the early stages of secretion within the endoplasmic reticulum, leading to stress responses in the host, including the unfolded protein response (UPR). Most of the key elements constituting the signal transduction pathway of the UPR in Saccharomyces cerevisiae have been identified in filamentous fungi, including the central activation mechanism of the pathway, that is, the stress-induced splicing of an unconventional (nonspliceosomal) intron in orthologs of the HAC1 mRNA. This splicing event relieves a translational block in the HAC1 mRNA, allowing for the translation of the bZIP transcription factor Hac1p that regulates the expression of UPR target genes. The UPR is involved in regulating the folding, yield, and delivery of secretory proteins and that has consequences for fungal lifestyles, including virulence and biotechnology. The recent releases of genome sequences of several species of filamentous fungi and the availability of DNA arrays, GeneChips, and deep sequencing methodologies have provided an unprecedented resource for exploring expression profiles in response to secretion stresses. Furthermore, genome-wide investigation of translation profiles through polysome analyses is possible, and here, we outline methods for the use of such techniques with filamentous fungi and, principally, Aspergillus niger. We also describe methods for the batch and controlled cultivation of A. niger and for the replacement and study of its hacA gene, which provides either a UPR-deficient strain or a constitutively activated UPR strain for comparative analysis with its wild type. Although we focus on A. niger, the utility of the hacA-deletion strategy is also described for use in investigating the virulence of the plant pathogen Alternaria brassicicola.
Collapse
|
14
|
Abstract
The next-generation sequencing technologies are being rapidly applied in biological research. Tens of millions of short sequences generated in a single experiment provide us enormous information on genome composition, genetic variants, gene expression levels and protein binding sites depending on the applications. Various methods are being developed for analyzing the data generated by these technologies. However, the relevant experimental design issues have rarely been discussed. In this review, we use RNA-seq as an example to bring this topic into focus and to discuss experimental design and validation issues pertaining to next-generation sequencing in the quantification of transcripts.
Collapse
Affiliation(s)
- Zhide Fang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, USA
| | | |
Collapse
|
15
|
Wetzel J, Kingsford C, Pop M. Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics 2011; 12:95. [PMID: 21486487 PMCID: PMC3103447 DOI: 10.1186/1471-2105-12-95] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 04/13/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-generation sequencing technologies allow genomes to be sequenced more quickly and less expensively than ever before. However, as sequencing technology has improved, the difficulty of de novo genome assembly has increased, due in large part to the shorter reads generated by the new technologies. The use of mated sequences (referred to as mate-pairs) is a standard means of disambiguating assemblies to obtain a more complete picture of the genome without resorting to manual finishing. Here, we examine the effectiveness of mate-pair information in resolving repeated sequences in the DNA (a paramount issue to overcome). While it has been empirically accepted that mate-pairs improve assemblies, and a variety of assemblers use mate-pairs in the context of repeat resolution, the effectiveness of mate-pairs in this context has not been systematically evaluated in previous literature. RESULTS We show that, in high-coverage prokaryotic assemblies, libraries of short mate-pairs (about 4-6 times the read-length) more effectively disambiguate repeat regions than the libraries that are commonly constructed in current genome projects. We also demonstrate that the best assemblies can be obtained by 'tuning' mate-pair libraries to accommodate the specific repeat structure of the genome being assembled - information that can be obtained through an initial assembly using unpaired reads. These results are shown across 360 simulations on 'ideal' prokaryotic data as well as assembly of 8 bacterial genomes using SOAPdenovo. The simulation results provide an upper-bound on the potential value of mate-pairs for resolving repeated sequences in real prokaryotic data sets. The assembly results show that our method of tuning mate-pairs exploits fundamental properties of these genomes, leading to better assemblies even when using an off -the-shelf assembler in the presence of base-call errors. CONCLUSIONS Our results demonstrate that dramatic improvements in prokaryotic genome assembly quality can be achieved by tuning mate-pair sizes to the actual repeat structure of a genome, suggesting the possible need to change the way sequencing projects are designed. We propose that a two-tiered approach - first generate an assembly of the genome with unpaired reads in order to evaluate the repeat structure of the genome; then generate the mate-pair libraries that provide most information towards the resolution of repeats in the genome being assembled - is not only possible, but likely also more cost-effective as it will significantly reduce downstream manual finishing costs. In future work we intend to address the question of whether this result can be extended to larger eukaryotic genomes, where repeat structure can be quite different.
Collapse
Affiliation(s)
- Joshua Wetzel
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | | | | |
Collapse
|
16
|
Duitama J, Kennedy J, Dinakar S, Hernández Y, Wu Y, Măndoiu II. Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads. BMC Bioinformatics 2011; 12 Suppl 1:S53. [PMID: 21342586 PMCID: PMC3044311 DOI: 10.1186/1471-2105-12-s1-s53] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping. RESULTS In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/. CONCLUSIONS Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies.
Collapse
Affiliation(s)
- Jorge Duitama
- Department of Computer Science & Engineering, University of Connecticut, 371 Fairfield Rd, Unit 2155, Storrs, CT 06269-2155, USA.
| | | | | | | | | | | |
Collapse
|
17
|
Chen S, Yang P, Jiang F, Wei Y, Ma Z, Kang L. De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits. PLoS One 2010; 5:e15633. [PMID: 21209894 PMCID: PMC3012706 DOI: 10.1371/journal.pone.0015633] [Citation(s) in RCA: 189] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 11/15/2010] [Indexed: 12/27/2022] Open
Abstract
Locusts exhibit remarkable density-dependent phenotype (phase) changes from the solitary to the gregarious, making them one of the most destructive agricultural pests. This phenotype polyphenism arises from a single genome and diverse transcriptomes in different conditions. Here we report a de novo transcriptome for the migratory locust and a comprehensive, representative core gene set. We carried out assembly of 21.5 Gb Illumina reads, generated 72,977 transcripts with N50 2,275 bp and identified 11,490 locust protein-coding genes. Comparative genomics analysis with eight other sequenced insects was carried out to identify the genomic divergence between hemimetabolous and holometabolous insects for the first time and 18 genes relevant to development was found. We further utilized the quantitative feature of RNA-seq to measure and compare gene expression among libraries. We first discovered how divergence in gene expression between two phases progresses as locusts develop and identified 242 transcripts as candidates for phase marker genes. Together with the detailed analysis of deep sequencing data of the 4(th) instar, we discovered a phase-dependent divergence of biological investment in the molecular level. Solitary locusts have higher activity in biosynthetic pathways while gregarious locusts show higher activity in environmental interaction, in which genes and pathways associated with regulation of neurotransmitter activities, such as neurotransmitter receptors, synthetase, transporters, and GPCR signaling pathways, are strongly involved. Our study, as the largest de novo transcriptome to date, with optimization of sequencing and assembly strategy, can further facilitate the application of de novo transcriptome. The locust transcriptome enriches genetic resources for hemimetabolous insects and our understanding of the origin of insect metamorphosis. Most importantly, we identified genes and pathways that might be involved in locust development and phase change, and may thus benefit pest management.
Collapse
Affiliation(s)
- Shuang Chen
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Pengcheng Yang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Feng Jiang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Yuanyuan Wei
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Zongyuan Ma
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Le Kang
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|